Find all pairs of consecutive numbers in BST - binary-search-tree

I need to write a code that will find all pairs of consecutive numbers in BST.
For example: let's take the BST T with key 9, T.left.key = 8, T.right.key = 19. There is only one pair - (8, 9).
The naive solution that I thought about is to do any traversal (pre, in, post) on the BST and for each node to find its successor and predecessor, and if one or two of them are consecutive to the node - we'll print them. But the problem is that it'll will the O(n^2), because we have n nodes and for each one of them we use function that takes O(h), that in the worst case h ~ n.
Second solution is to copy all the elements to an array, and to find the consecutive numbers in the array. Here we use O(n) additional space, but the runtime is better - O(n).
Can you help me to find an efficient algorithm to do it? I'm trying to think about algorithm that don't use additional space, and its runtime is better than O(n^2)
*The required output is the number of those pairs (No need to print the pairs).
*any 2 consecutive integers in the BST is a pair.
*The BST containts only integers.
Thank you!

Why don't you just do an inorder traversal and count pairs on the fly? You'll need a global variable to keep track of the last number, and you'll need to initialize it to something which is not one less than the first number (e.g. the root of the tree). I mean:
// Last item
int last;
// Recursive function for in-order traversal
int countPairs (whichever_type treeRoot)
{
int r = 0; // Return value
if (treeRoot.leftChild != null)
r = r + countPairs (treeRoot.leftChild);
if (treeRoot.value == last + 1)
r = r + 1;
last = treeRoot.value;
if (treeRoot.rightChild != null)
r = r + countPairs (treeRoot.rightChild);
return r; // Edit 2016-03-02: This line was missing
}
// Main function
main (whichever_type treeRoot)
{
int r;
if (treeRoot == null)
r = 0;
else
{
last = treeRoot.value; // to make sure this is not one less than the lowest element
r = countPairs (treeRoot);
}
// Done. Now the variable r contains the result
}

Related

Trade off between Linear and Binary Search

I have a list of elements to be searched in a dataset of variable lengths. I have tried binary search and I found it is not always efficient when the objective is to search a list of elements.
I did the following study and conclude that if the number of elements to be searched is less than 5% of the data, binary search is efficient, other wise the Linear search is better.
Below are the details
Number of elements : 100000
Number of elements to be searched: 5000
Number of Iterations (Binary Search) =
log2 (N) x SearchCount=log2 (100000) x 5000=83048
Further increase in the number of search elements lead to more iterations than the linear search.
Any thoughts on this?
I am calling the below function only if the number elements to be searched is less than 5%.
private int SearchIndex(ref List<long> entitylist, ref long[] DataList, int i, int len, ref int listcount)
{
int Start = i;
int End = len-1;
int mid;
while (Start <= End)
{
mid = (Start + End) / 2;
long target = DataList[mid];
if (target == entitylist[listcount])
{
i = mid;
listcount++;
return i;
}
else
{
if (target < entitylist[listcount])
{
Start = mid + 1;
}
if (target > entitylist[listcount])
{
End = mid - 1;
}
}
}
listcount++;
return -1; //if the element in the list is not in the dataset
}
In the code I retun the index rather than the value because, I need to work with Index in the calling function. If i=-1, the calling function resets the value to the previous i and calls the function again with a new element to search.
In your problem you are looking for M values in an N long array, N > M, but M can be quite large.
Usually this can be approached as M independent binary searches (or even with the slight optimization of using the previous result as a starting point): you are going to O(M*log(N)).
However, using the fact that also the M values are sorted, you can find all of them in one pass, with linear search. In this case you are going to have your problem O(N). In fact this is better than O(M*log(N)) for M large.
But you have a third option: since M values are sorted, binary split M too, and every time you find it, you can limit the subsequent searches in the ranges on the left and on the right of the found index.
The first look-up is on all the N values, the second two on (average) N/2, than 4 on N/4 data,.... I think that this scale as O(log(M)*log(N)). Not sure of it, comments welcome!
However here is a test code - I have slightly modified your code, but without altering its functionality.
In case you have M=100000 and N=1000000, the "M binary search approach" takes about 1.8M iterations, that's more that the 1M needed to scan linearly the N values. But with what I suggest it takes just 272K iterations.
Even in case the M values are very "collapsed" (eg, they are consecutive), and the linear search is in the best condition (100K iterations would be enough to get all of them, see the comments in the code), the algorithm performs very well.

Calculating time complexity using master method

Can anyone explain the time complexity of the below using the master method?
int sum(Node node) {
if (node == null) {
return 0;
}
return sum(node.left) + node.value + sum(node.right);
}
I know a's value is 2 but its hard for me to identify b and d. Is b=1 and d=cO(n)? In that case can anyone explain how b and d should be identified
well, to make the recurrence relation less complicated, we can assume a balanced binary tree that has 2^inodes, so we obtain a recurrence of T(n) = 2T(n/2) + 1 (ignoring the base case).
From above, we can find a = 2, b = 2, and c = 0, since 1 is O(1). Applying the master method, it passes case 1 and we can get our complexity as T(n) = Θ(nlog22) or O(n)
This is a function used to sum up all nodes in a binary tree. First down from root to leave and then goes up (stack unwinding). So, time complexity is O(N), as it needs to visit each node at least once.

Get the most occuring number amongst several integers without using arrays

DISCLAIMER: Rather theoretical question here, not looking for a correct answere, just asking for some inspiration!
Consider this:
A function is called repetitively and returns integers based on seeds (the same seed returns the same integer). Your task is to find out which integer is returned most often. Easy enough, right?
But: You are not allowed to use arrays or fields to store return values of said function!
Example:
int mostFrequentNumber = 0;
int occurencesOfMostFrequentNumber = 0;
int iterations = 10000000;
for(int i = 0; i < iterations; i++)
{
int result = getNumberFromSeed(i);
int occurencesOfResult = magic();
if(occurencesOfResult > occurencesOfMostFrequentNumber)
{
mostFrequentNumber = result;
occurencesOfMostFrequentNumber = occurencesOfResult;
}
}
If getNumberFromSeed() returns 2,1,5,18,5,6 and 5 then mostFrequentNumber should be 5 and occurencesOfMostFrequentNumber should be 3 because 5 is returned 3 times.
I know this could easily be solved using a two-dimentional list to store results and occurences. But imagine for a minute that you can not use any kind of arrays, lists, dictionaries etc. (Maybe because the system that is running the code has such a limited memory, that you cannot store enough integers at once or because your prehistoric programming language has no concept of collections).
How would you find mostFrequentNumber and occurencesOfMostFrequentNumber? What does magic() do?? (Of cause you do not have to stick to the example code. Any ideas are welcome!)
EDIT: I should add that the integers returned by getNumber() should be calculated using a seed, so the same seed returns the same integer (i.e. int result = getNumber(5); this would always assign the same value to result)
Make an hypothesis: Assume that the distribution of integers is, e.g., Normal.
Start simple. Have two variables
. N the number of elements read so far
. M1 the average of said elements.
Initialize both variables to 0.
Every time you read a new value x update N to be N + 1 and M1 to be M1 + (x - M1)/N.
At the end M1 will equal the average of all values. If the distribution was Normal this value will have a high frequency.
Now improve the above. Add a third variable:
M2 the average of all (x - M1)^2 for all values of xread so far.
Initialize M2 to 0. Now get a small memory of say 10 elements or so. For every new value x that you read update N and M1 as above and M2 as:
M2 := M2 + (x - M1)^2 * (N - 1) / N
At every step M2 is the variance of the distribution and sqrt(M2) its standard deviation.
As you proceed remember the frequencies of only the values read so far whose distances to M1 are less than sqrt(M2). This requires the use of some additional array, however, the array will be very short compared to the high number of iterations you will run. This modification will allow you to guess better the most frequent value instead of simply answering the mean (or average) as above.
UPDATE
Given that this is about insights for inspiration there is plenty of room for considering and adapting the approach I've proposed to any particular situation. Here are some thoughts
When I say assume that the distribution is Normal you should think of it as: Given that the problem has no solution, let's see if there is some qualitative information I can use to decide what kind of distribution would the data have. Given that the algorithm is intended to find the most frequent number, it should be fine to assume that the distribution is not uniform. Let's try with Normal, LogNormal, etc. to see what can be found out (more on this below.)
If the game completely disallows the use of any array, then fine, keep track of only, say 10 numbers. This would allow you to count the occurrences of the 10 best candidates, which will give more confidence to your answer. In doing this choose your candidates around the theoretical most likely value according to the distribution of your hypothesis.
You cannot use arrays but perhaps you can read the sequence of numbers two or three times, not just once. In that case you can read it once to check whether you hypothesis about its distribution is good nor bad. For instance, if you compute not just the variance but the skewness and the kurtosis you will have more elements to check your hypothesis. For instance, if the first reading indicates that there is some bias, you could use a LogNormal distribution instead, etc.
Finally, in addition to providing the approximate answer you would be able to use the information collected during the reading to estimate an interval of confidence around your answer.
Alright, I found a decent solution myself:
int mostFrequentNumber = 0;
int occurencesOfMostFrequentNumber = 0;
int iterations = 10000000;
int maxNumber = -2147483647;
int minNumber = 2147483647;
//Step 1: Find the largest and smallest number that _can_ occur
for(int i = 0; i < iterations; i++)
{
int result = getNumberFromSeed(i);
if(result > maxNumber)
{
maxNumber = result;
}
if(result < minNumber)
{
minNumber = result;
}
}
//Step 2: for each possible number between minNumber and maxNumber, count occurences
for(int thisNumber = minNumber; thisNumber <= maxNumber; thisNumber++)
{
int occurenceOfThisNumber = 0;
for(int i = 0; i < iterations; i++)
{
int result = getNumberFromSeed(i);
if(result == thisNumber)
{
occurenceOfThisNumber++;
}
}
if(occurenceOfThisNumber > occurencesOfMostFrequentNumber)
{
occurencesOfMostFrequentNumber = occurenceOfThisNumber;
mostFrequentNumber = thisNumber;
}
}
I must admit, this may take a long time, depending on the smallest and largest possible. But it will work without using arrays.

Please explain this code for Merkle–Hellman knapsack cryptosystem?

This is the code snippet from a program that implements Merkle–Hellman knapsack cryptosystem.
// Generates keys based on input data size
private void generateKeys(int inputSize)
{
// Generating values for w
// This first value of the private key (w) is set to 1
w.addNode(new BigInteger("1"));
for (int i = 1; i < inputSize; i++)
{
w.addNode(nextSuperIncreasingNumber(w));
}
// Generate value for q
q = nextSuperIncreasingNumber(w);
// Generate value for r
Random random = new Random();
// Generate a value of r such that r and q are coprime
do
{
r = q.subtract(new BigInteger(random.nextInt(1000) + ""));
}
while ((r.compareTo(new BigInteger("0")) > 0) && (q.gcd(r).intValue() != 1));
// Generate b such that b = w * r mod q
for (int i = 0; i < inputSize; i++)
{
b.addNode(w.get(i).getData().multiply(r).mod(q));
}
}
Just tell me what is going on in the following lines:
do
{
r = q.subtract(new BigInteger(random.nextInt(1000) + ""));
}
while ((r.compareTo(new BigInteger("0")) > 0) && (q.gcd(r).intValue() != 1));
(1) Why is random number generated with upper bound 1000?
(2) Why is it subtracted from q?
The code is searching for a value that is co-prime with the already selected value q. In my opinion, it's doing so rather poorly, but you mention it's a simulator? I'm not sure what that means, but maybe it just means the code is quick and dirty rather than slow and secure.
Answering your questions directly:
Why is random number generated with upper bound 1000?
The Merkle-Hellman algorithm does indicate that r should be 'random'. The implementation for doing so is pretty haphazard; that might be what's thrown you off. The code is not technically an algorithm because the loop is not guaranteed to terminate. In theory, the psuedo-random candidate selection of r could be an arbitrarily long sequence of numbers which aren't co-prime to q, resulting in an infinite loop.
The upper bound of 1000 could be to ensure that the chosen r is sufficiently large. In general, large keys are harder to break than small keys, so if q is large, then this code will only find large r.
A more deterministic way to get a random co-prime would be to test each number lower than q, generating a list of co-primes and select one at random. This would probably be more secure, as an attacker knowing that q and r are within 1000 of each other would have a significantly reduced search space.
Why is it subtracted from q?
The subtraction is important because r must be less than q. The Merkle-Hellmen algorithm specifies it that way. I'm not convince that it needs to be that way. The public key is generated by multiplying each element in w by r and taking the modulus q. If r were very large, larger than q, it seems like it would further obfuscate q and each element in w.
The decryption step of Merkle-Hellmen, on the other hand, depends on the modular inverse of each encrypted letter a x r−1 mod q. This operation might be hampered by having r > q; it seems like it could still work out.
However, if nextInt can return 0, that iteration of the loop is a waste as a q and r must be different (gcd(a,a) is just a).
Breaking down the code:
do
Try it at least once. r is probably null or undefined before the method is called.
r = q.subtract(new BigInteger(random.nextInt(1000) + ""));
Find a candidate value that's between q and q - 1000.
while ((r.compareTo(new BigInteger("0")) > 0) && (q.gcd(r).intValue() != 1));
Keep going until you've found an r that is:
Greater than 0 r.compareTo(new BigInteger("0")) > 0, and
Is co-prime with q, q.gcd(r).intValue() != 1. Obviously, a randomly selected number is not guaranteed to be co-prime with another other number, so the randomly generated candidate might not be work for this q.
Does that clear it up? I have to admit that I'm not an expert on Merkle-Hellman.

Algorithm - find the minimal subtraction between sum of two arrays

I am hunting job now and doing many algorithm exercises. Here is my problem:
Given two arrays: a and b with same length, the subject is to make |sum(a)-sum(b)| minimal, by swapping elements between a and b.
Here is my though:
assume we swap a[i] and b[j], set Delt = sum(a) - sum(b), x = a[i]-b[j]
then Delt2 = sum(a)-a[i]+b[j] - (sum(b)-b[j]+a[i]) = Delt - 2*x,
then the change = |Delt| - |Delt2|, which is proportional to |Delt|^2 - |Delt2|^2 = 4*x*(Delt-x),
Based on the thought above I got the following code:
Delt = sum(a) - sum(b);
done = false;
while(!done)
{
done = true;
for i = [0, n)
{
for j = [0,n)
{
x = a[i]-b[j];
change = x*(Delt-x);
if(change >0)
{
swap(a[i], b[j]);
Delt = Delt - 2*x;
done = false;
}
}
}
}
However, does anybody have a much better solution ? If you got, please tell me and I would be very grateful of you!
This problem is basically the optimization problem for Partition Problem with an extra constraint of equal parts. I'll prove that adding this constraint doesn't make the problem easier.
NP-Hardness proof:
Assume there was an algorithm A that solves this problem in polynomial time, we can solve the Partition-Problem in polynomial time.
Partition(S):
for i in range(|S|):
S += {0}
result <- A(S\2,S\2) //arbitrary split S into 2 parts
if result is a partition: //simple to check, since partition is NP.
return true.
return false //no partition
Correctness:
If there is a partition denote as (S1,S2) [assume S2 has more elements], on iteration |S2|-|S1| [i.e. when adding |S2|-|S1| zeros]. The input to A will contatin enough zeros so we can return two equal length arrays: S2,S1+{0,0,...,0}, which will be a partition to S, and the algorithm will yield true.
If the algorithm yields true, and iteration k, we had two arrays: S2,S1, with same number of elements, and equal values. by removing k zeros from the arrays, we get a partition to the original S, so S had a partition.
Polynomial:
assume A takes P(n) time, the algorithm we produced will take n*P(n) time, which is also polynomial.
Conclusion:
If this problem is solveable in polynomial time, so does the Partion-Problem, and thus P=NP. based on this: this problem is NP-Hard.
Because this problem is NP-Hard, for an exact solution you will probably need an exponential algorith. One of those is simple backtracking [I leave it as an exercise to the reader to implement a backtracking solution]
EDIT: as mentioned by #jpalecek: by simply creating a reduction: S->S+(0,0,...,0) [k times 0], one can directly prove NP-Hardness by reduction. polynomial is trivial and correctness is very similar to the above partion's correctness proof: [if there is a partition, adding 'balancing' zeros is possible; the other direction is simply trimming those zeros]
Just a comment. Through all this swapping you can basically arrange the contents of both arrays as you like. So it is unimportant in which array the values are at start.
Can't do it in my head but I'm pretty sure there is a constructive solution. I think if you sort them first and then deal them according to some rule. Something along the lines If value > 0 and if sum(a)>sum(b) then insert to a else into b