How many (balanced) BST can be generated from a sorted list - binary-search-tree

is there a generalized formula that can calculate the number of (balanced) BST can be generated from a sorted list, like (1,2,3,4,5,6,7,8)? Thanks.

Related

Why is the time complexity of binary search logN but the time complexity of a BST is N?

In Algorithms, 4th edition by Robert Sedgewick, the time complexity table for different algorithms is given as:
Based on this table, the searching time complexity of a BST is N, and of binary search in and of itself is logN.
What is the difference between the two? I have seen explanations about these separately and they made sense, however, I can't seem to understand why the searching time complexity of a BST isn't logN, as we are searching by continually breaking the tree in half and ignoring the other parts.
From binary-search-trees-bst-explained-with-examples
...on average, each comparison allows the operations to skip about half of the tree, so that each lookup, insertion or deletion takes time proportional to the logarithm of the number of items stored in the tree, O(log n) . However, some times the worst case can happen, when the tree isn't balanced and the time complexity is O(n) for all three of these functions.
So, you kind of expect log(N) but it's not absolutely guaranteed.
the searching time complexity of a BST is N, and of binary search in and of itself is logN. What is the difference between the two?
The difference is that a binary search on a sorted array always starts at the middle element (i.e. the median when n is odd). This cannot be guaranteed in a BST. The root might be the middle element, but it doesn't have to be.
For instance, this is a valid BST:
10
/
8
/
5
/
2
/
1
...but it is not a balanced one, and so the process of finding the value 1 given the root of that tree, will include visiting all its nodes. If however the same values were presented in a sorted list (1,2,5,8,10), a binary search would start at 5 and never visit 8 or 10.
Adding self-balancing trees to the table
We can extend the given table with self-balancing search trees, like AVL, and then we get this:
implementation
search
insert
delete
sequential search (unordered list)
𝑁
𝑁
𝑁
binary search (ordered array)
lg𝑁
𝑁
𝑁
BST
𝑁
𝑁
𝑁
AVL
lgN
lgN
lgN

Searching for groups of objects given a reduction function

I have a few questions about a type of search.
First, is there a name and if so what is the name of the following type of search? I want to search for subsets of objects from some collection such that a reduction and filter function applied to the subset is true. For example, say I have the following objects, each of which contains an id and a value.
[A,10]
[B,10]
[C,10]
[D,9]
[E,11]
I want to search for "all the sets of objects whose summed values equal 30" and I would expect the output to be, {{A,B,C}, {A,D,E}, {B,D,E}, {C,D,E}}.
Second, is the only strategy to perform this search brute-force? Is there some type of general-purpose algorithm for this? Or are search optimizations dependent on the reduction function?
Third, if you came across this problem, what tools would you use to solve it in a general way? Assume the reduction and filter functions could be anything and are not necessarily the sum function. Does SQL provide a good API for this type of search? What about Prolog? Any interesting tips and tricks would be appreciated.
Thanks.
I cannot comment on the problem in general but brute forcing search can be easily done in prolog.
w(a,10).
w(b,10).
w(c,10).
w(d,9).
w(e,11).
solve(0, [], _).
solve(N, [X], [X|_]) :- w(X, N).
solve(N, [X|Xs], [X|Bs]) :-
w(X, W),
W < N,
N1 is N - W,
solve(N1, Xs, Bs).
solve(N, [X|Xs], [_|Bs]) :- % skip element if previous clause fails
solve(N, [X|Xs], Bs).
Which gives
| ?- solve(30, X, [a, b, c, d, e]).
X = [a,b,c] ? ;
X = [a,d,e] ? ;
X = [b,d,e] ? ;
X = [c,d,e] ? ;
(1 ms) no
Sql is TERRIBLE at this kind of problem. Until recently there was no way to get 'All Combinations' of row elements. Now you can do so with Recursive Common Table Expressions, but you are forced by its limitations to retain all partial results as well as final results which you would have to filter out for your final results. About the only benefit you get with SQL's recursive procedure is that you can stop evaluating possible combinations once a sub-path exceeds 30, your target total. That makes it slightly less ugly than an 'evaluate all 2^N combinations' brute force solution (unless every combination sums to less than the target total).
To solve this with SQL you would be running an algorithm that can be described as:
Seed your result set with all table entries less than your target total and their value as a running sum.
Iteratively join your prior result with all combinations of table that were not already used in the result set and whose value added to running sum is less than or equal to target total. Running sum becomes old running sum plus value, and append ID to ID LIST. Union this new result to the old results. Iterate until no more records qualify.
Make a final pass of the result set to filter out the partial sums that do not total to your target.
Oh, and unless you make special provisions, solutions {A,B,C}, {C,B,A}, and {A,C,B} all look like different solutions (order is significant).

What would be the binary search complexity to find second largest number in array

Can someone explain how to calculate the binary search complexity to find second largest number in array.
Binary search is done on a sorted array.
If you already have a sorted array, why do you need to do anything at all?
The second to last number in the array (sorted in ascending order) would be the second largest number.(O(1))
If the array contains duplicates:
For example,
{0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,... }
The time complexity would be O(log n) where n is the number of elements in the array.
The smallest number is the one at index 0 (call it x), now you can use binary search to find the array bounds within which all elements are equal to x. The immediate neighbour outside these bounds would be the second largest number in the array.
If you are using C++, you can use this method to get the upper_bound.
Binary search for an element with any given property is always logarithmic, provided that you can determine in constant time whether that property holds.
If the array can’t contain duplicates, you don’t need a binary search and the complexity is constant.
One way to do it in python efficiently can be to convert list[which allows duplicates] to set[which does not allow duplicates] almost in O(1) time and then fetching item at index[-2] again in O(1) time, assuming that as it is binary search list would be sorted in ascending order.

Scala: Find edit-distance for all elements of a list not fitting in memory

In my previous question I was asking for advice on an algorithm to compare all elements in huge list:
Scala: Compare all elements in a huge list
A more general problem I am facing and would be grateful to get some advice is to do approximate comparison of list elements for a list not fitting into memory all at once. I am building this list from SQL request that returns a cursor to iterate a single string field of about 70 000 000 records. I need to find edit-distance (http://en.wikipedia.org/wiki/Edit_distance) between every two string elements in this list.
My idea here is to use sliding window of N records to compare all 70 000 000 records:
Read N elements into a list that nicely fit into memory (N ~ 10 000)
Calculate edit-distance for all elements in this list using algorithm described here:
Scala: Compare all elements in a huge list
Read next N elements (from N to 2N-1) into a new list. Compare all these as in 2.
Rewind SQL query cursor to the first record
Compare every string starting from index 0 to N with all strings in this new list using the same algorithm as in 2.
Slide window to read strings form 2N to 3N-1 records into a new list
Compare every string starting from index 0 to 2N with all strings in this new list using the same algorithm as in 2.
etc.
All comparison results I need to write into DB as (String, String, Distance) records where first two elements are strings to match and third is a result.
Questions:
How to force Scala to garbage collect unneeded lists from the previous steps of this algorithm?
This algorithm is awful in terms of number of calculations required to do the job. Any other algorithms, ideas on how to reduce complexity?
Thanks!

Redis Sorted Sets: How do I get the first intersecting element?

I have a number of large sorted sets (5m-25m) in Redis and I want to get the first element that appears in a combination of those sets.
e.g I have 20 sets and wanted to take set 1, 5, 7 and 12 and get only the first intersection of only those sets.
It would seem that a ZINTERSTORE followed by a "ZRANGE foo 0 0" would be doing a lot more work that I require as it would calculate all the intersections then return the first one. Is there an alternative solution that does not need to calculate all the intersections?
There is no direct, native alternative, although I'd suggest this:
Create a hash which its members are your elements. Upon each addition to one of your sorted sets, increment the relevant member (using HINCRBY). Of course, you'll make the increment only after you check that the element does not exist already in the sorted set you are attempting to add to.
That way, you can quickly know which elements appear in 4 sets.
UPDATE: Now that I rethink about it, it might be too expensive to query your hash to find items with value of 4 (O(n)). Another option would be creating another Sorted Set, which its members are your elements, and their score gets incremented (as I described before, but using ZINCRBY), and you can quickly pull all elements with score 4 (using ZRANGEBYSCORE).