Where to start with Binary Search Tree? - binary-search-tree

By my understanding when completing a binary search you start with the middle value and complete a divide and conquer algorithm upon it until you find the correct value.
However when I have looked at Binary Search Trees it was my understanding that this is completed in the same way with the initial node being the middle value, however I have seen examples of unsorted lists starting with first node being the first value in the array.
Which method is correct?
Thanks

Typically, you start with the middle node, then examine the left and right halves.
Divide and conquer algorithms approach the problem recursively by breaking the original problem into sub-problems of smaller size. The problem will be reduced down until it is small enough to be solved in a straightforward manner.
It the case of the Binary Search Tree, the algorithm takes the middle node, then recursively solves the right and left sub-problems.
BinarySearch(Array arr, value)
return BinarySearchAux(arr, value, 0, arr.length)
BinarySearch(Array arr, value, start, end)
if start >= end
return value == arr[start]
mid = floor((end - start) / 2)
if value == arr[mid]
return true
return
BinarySearchAux(arr, value, start, mid-1) ||
BinarySearchAux(arr, value, mid+1, end)

Related

Binary search start or end is target

Why is it that when I see example code for binary search there is never an if statement to check if the start of the array or end is the target?
import java.util.Arrays;
public class App {
public static int binary_search(int[] arr, int left, int right, int target) {
if (left > right) {
return -1;
}
int mid = (left + right) / 2;
if (target == arr[mid]) {
return mid;
}
if (target < arr[mid]) {
return binary_search(arr, left, mid - 1, target);
}
return binary_search(arr, mid + 1, right, target);
}
public static void main(String[] args) {
int[] arr = { 3, 2, 4, -1, 0, 1, 10, 20, 9, 7 };
Arrays.sort(arr);
for (int i = 0; i < arr.length; i++) {
System.out.println("Index: " + i + " value: " + arr[i]);
}
System.out.println(binary_search(arr, arr[0], arr.length - 1, -1));
}
}
in this example if the target was -1 or 20 the search would enter recursion. But it added an if statement to check if target is mid, so why not add two more statements also checking if its left or right?
EDIT:
As pointed out in the comments, I may have misinterpreted the initial question. The answer below assumes that OP meant having the start/end checks as part of each step of the recursion, as opposed to checking once before the recursion even starts.
Since I don't know for sure which interpretation was intended, I'm leaving this post here for now.
Original post:
You seem to be under the impression that "they added an extra check for mid, so surely they should also add an extra check for start and end".
The check "Is mid the target?" is in fact not a mere optimization they added. Recursively checking "mid" is the whole point of a binary search.
When you have a sorted array of elements, a binary search works like this:
Compare the middle element to the target
If the middle element is smaller, throw away the first half
If the middle element is larger, throw away the second half
Otherwise, we found it!
Repeat until we either find the target or there are no more elements.
The act of checking the middle is fundamental to determining which half of the array to continue searching through.
Now, let's say we also add a check for start and end. What does this gain us? Well, if at any point the target happens to be at the very start or end of a segment, we skip a few steps and end slightly sooner. Is this a likely event?
For small toy examples with a few elements, yeah, maybe.
For a massive real-world dataset with billions of entries? Hm, let's think about it. For the sake of simplicity, we assume that we know the target is in the array.
We start with the whole array. Is the first element the target? The odds of that is one in a billion. Pretty unlikely. Is the last element the target? The odds of that is also one in a billion. Pretty unlikely too. You've wasted two extra comparisons to speed up an extremely unlikely case.
We limit ourselves to, say, the first half. We do the same thing again. Is the first element the target? Probably not since the odds are one in half a billion.
...and so on.
The bigger the dataset, the more useless the start/end "optimization" becomes. In fact, in terms of (maximally optimized) comparisons, each step of the algorithm has three comparisons instead of the usual one. VERY roughly estimated, that suggests that the algorithm on average becomes three times slower.
Even for smaller datasets, it is of dubious use since it basically becomes a quasi-linear search instead of a binary search. Yes, the odds are higher, but on average, we can expect a larger amount of comparisons before we reach our target.
The whole point of a binary search is to reach the target with as few wasted comparisons as possible. Adding more unlikely-to-succeed comparisons is typically not the way to improve that.
Edit:
The implementation as posted by OP may also confuse the issue slightly. The implementation chooses to make two comparisons between target and mid. A more optimal implementation would instead make a single three-way comparison (i.e. determine ">", "=" or "<" as a single step instead of two separate ones). This is, for instance, how Java's compareTo or C++'s <=> normally works.
BambooleanLogic's answer is correct and comprehensive. I was curious about how much slower this 'optimization' made binary search, so I wrote a short script to test the change in how many comparisons are performed on average:
Given an array of integers 0, ... , N
do a binary search for every integer in the array,
and count the total number of array accesses made.
To be fair to the optimization, I made it so that after checking arr[left] against target, we increase left by 1, and similarly for right, so that every comparison is as useful as possible. You can try this yourself at Try it online
Results:
Binary search on size 10: Standard 29 Optimized 43 Ratio 1.4828
Binary search on size 100: Standard 580 Optimized 1180 Ratio 2.0345
Binary search on size 1000: Standard 8987 Optimized 21247 Ratio 2.3642
Binary search on size 10000: Standard 123631 Optimized 311205 Ratio 2.5172
Binary search on size 100000: Standard 1568946 Optimized 4108630 Ratio 2.6187
Binary search on size 1000000: Standard 18951445 Optimized 51068017 Ratio 2.6947
Binary search on size 10000000: Standard 223222809 Optimized 610154319 Ratio 2.7334
so the total comparisons does seem to tend to triple the standard number, implying the optimization becomes increasingly unhelpful for larger arrays. I'd be curious whether the limiting ratio is exactly 3.
To add some extra check for start and end along with the mid value is not impressive.
In any algorithm design the main concerned is moving around it's complexity either it is time complexity or space complexity. Most of the time the time complexity is taken as more important aspect.
To learn more about Binary Search Algorithm in different use case like -
If Array is not containing any repeated
If Array has repeated element in this case -
a) return leftmost index/value
b) return rightmost index/value
and many more point

Randomly increasing sequence- Wolfram Mathematica

Good afternoon, I have a problem making recurrence table with randomly increasing sequence. I want it to return an increasing sequence with a random difference between two elements. Right now I've got:
RecurrenceTable[{a[k+1]==a[k] + RandomInteger[{0,4}], a[1]==-12},a,{k,1,5}]
But it returns me an arithmetic progression with chosen d for all k (e.g. {-12,-8,-4,0,4,8,12,16,20,24}).
Also, I will be really grateful for explaining why if I replace every k in my code with n I get:
RecurrenceTable[{4+a[n] == a[n],a[1] == -12},a,{n,1,10}]
Thank You very much for Your time!
I don't believe that RecurrenceTable is what you are looking for.
Try this instead
FoldList[Plus,-12,RandomInteger[{0,4},5]]
which returns, this time,
{-12,-8,-7,-3,1,2}
and returns, this time,
{-12,-9,-5,-3,0,1}

Time and Space Complexity(for specific algorithm)

Despite the last 30 minutes i spent on trying to understand time and space complexity better, i still can't confidently determine those for the algorithm below:
bool checkSubstr(std::string sub)
{
//6 OR(||) connected if statement(checks whether the parameter
//is among the items in the list)
}
void checkWords(int start,int end)
{
int wordList[2] ={0};
int j = 0;
if (start < 0)
{
start = 0;
}
if (end>cAmount)
{
end = cAmount -1;
}
if (end-start < 2)
{
return;
}
for (int i = start; i <= end-2; i++)
{
if (crystals[i] == 'I' || crystals[i] == 'A')
{
continue;
}
if (checkSubstr(crystals.substr(i,3)))
{
wordList[j] = i;
j++;
}
}
if (j==1)
{
crystals.erase(wordList[0],3);
cAmount -= 3;
checkWords(wordList[0]-2,wordList[0]+1);
}
else if (j==2)
{
crystals.erase(wordList[0],(wordList[1]-wordList[0]+3));
cAmount -= wordList[1]-wordList[0]+3;
checkWords(wordList[0]-2,wordList[0]+1);
}
}
The function basically checks a sub-string of the whole string for predetermined (3 letter, e.g. "SAN") combinations of letters. Sub-string length can be 4-6 no real way to determine, depends on the input(pretty sure it's not relevant, although not 100%).
My reasoning:
If there are n letters in the string, worst case scenario, we have to check each of them. Again depending on the input, this can be done 3 ways.
All 6 length sub-strings: If this is the case the function runs n/6 times, each running 8(or 10?) processes, which(i think) means that its time complexity is O(n).
All 4 length sub-strings: Pretty much the same reason above, O(n).
4 and 6 length sub-strings mixed: Can't see why this would be different than previous 2. O(n)
As for the space complexity, i am completely lost. However, i have an idea:
If the function recurs for maximum amount of time,it will require:
n/4 x The Amount Used In One Run
which made me think it should be O(n). Although, i'm not convinced this is correct. I thought maybe seeing someone else's thought process on this example would help me understand how to calculate time and space complexity better.
Thank you for your time.
EDIT: Let me provide clearer information. We read a combination of 6 different letters into a string, this can be (almost)any combination in any length. 'crystals' is the string, and we are looking for 6 different 3 letter combinations in that list of letters. Sort of like a jewel matching game. Now the starting list contains no matches(none of the 6 predetermined combinations exist in the first place). Therefore the only way matches can occur from then on is by swaps or matches disappearing. Once a swap is processed by top level code, the function is called to check for matches, and if a match is found the function recurs after deleting the "match" part of the string.
Now let's look at how the code is looking for a match. To demonstrate a swap of 2 letters:
ABA B-R ZIB(no spaces or '-' in the actual string, used for better demonstration),
B and R is being swapped. This swap only effects the 6 letters starting from 2nd letter and ending on 7th letter. In other words, the letters the first A and last B can form a match with are same, before and after the swap, thus no point checking for matches including those words. So a sub-string of 6 letters sent to the checking algorithm. Similarly, if a formed match disappears(gets deleted from the string) the range of effected letters is 4. So when i thought of a worst case scenario, i imagined either 1 swap creating a whole chain reaction and matching all the way till there are not enough letters to form a match, or each match happens with a swap. Again, i am not saying this is how we should think when calculating time and space complexity but this is how the code works. Hope this is clear enough if not let me know and i can provide more details. It's also important to note that swap amount and places are a part of the input we read.
EDIT: Here is how the function is called on top level for the first time:
checkWords(swaps[i]-2,swaps[i]+3);
Sub-string length can be 4-6 no real way to determine, depends on the
input (pretty sure it's not relevant, although not 100%).
That's not what the code shows; the line if (checkSubstr(crystals.substr(i,3))) conveys that substrings always have exactly 3 characters. If the substring length varies, it is relevant, since your naive substring match will degrade to O(N*M) in the general case, where N is start-end+1 (the size of the input string) and M is the size of the substring being searched. This happens because in the worst case you'll compare M characters for each of the N characters of the source string.
The rest of this answer assumes that substrings are of size 3, since that's what the code shows.
If substrings are always 3 characters long, it's different: you can essentially assume checkSubstr() is O(1) because you will always compare at most 3 characters. The bulk of the work happens inside the for loop, which is O(N), where N is end-1-start.
After the loop, in the worst case (when one of the ifs is entered), you erase a bunch of characters from crystal. Assuming this is a string backed by an array in memory, this is an O(cAmount) operation, because all elements after wordList[0] must be shifted. The recursive call always passes in a range of size 4; it does not grow nor shrink with the size of the input, so you can also say there are O(1) recursive calls.
Thus, time complexity is O(N+cAmount) (where N is end-1-start), and space complexity is O(1).

Efficiently: Random numbers in fixed range without repetitions

Hey guys, I know that there are a million questions on random numbers, but exactly because of that I searched a lot but I couldn't find something similar to mine - without implying it's not there. In any case, pardon me if I am repeating a question, just point me to it if that's the case.
So, I wanna do something simple in the most efficient way.
I want to generate randomly all N integers in the range [0, N], one by one, such that there are no repetitions.
I know, I can do this by inserting everything in a list, shuffle it, get the head and then remove head from the list. But then I will have shuffled my list of length N, N-1 times.
Any better / faster idea?
You can just do one shuffle, and then step through the list.
I'd recommend a Fisher-Yates shuffle.
This question has been asked a few times, and in each case the correct answer given is to shuffle an array (either the original, or an array of indices), however this isn't a satisfactory answer in cases where the number of possible indices is prohibitively large (either it's huge, or memory is tight, or you simply crave maximum efficiency for whatever reason).
As such I want to add an alternative for the sake of completeness. Now, this isn't truly random, so if that's what you need then do not use this, however, if your goal is simply "good enough" with minimal memory requirements then the following pseudo-code may be of interest:
function init:
start = random [0, length) // Pick a fully random starting index
stride = random [1, length - 1) // Pick a random step size
next_index = start
function advance_next_index:
next_index = (next_index + stride) % length
if next_index is equal to start then
start = (start + 1) % length
next_index = start
Here's an example of how to implement a re-usable function for grabbing pseudo-random values:
counter = length
function pseudo_random:
counter = counter + 1
if counter is equal to length then
init()
counter = 0
advance_next_index()
return next_index
Quite simply pseudo_random will call init once every length iterations, thus re-shuffling the "random" pattern of results produced by advance_next_index, and ensure that for every length values there is not a single duplicate.
To reiterate; this isn't a particularly random algorithm, so it must not be used in situations where true randomness is required. However, the results are random enough for some basic, non-critical, tasks, and it has a tiny memory footprint. For example, if you just want to randomise some behaviour in a game to avoid something becoming repetitive, or the data-set is large and never exposed to the user (in which case it is effectively random to them) it would take a long time to piece together the order and somehow exploit it.
If anyone knows of any better algorithms with similar properties then please share!

Determing longest repeating cycle in a decimal expansion

Today I encountered this article about decimal expansion and I was instantaneously inspired to rework my solution on Project Euler Problem 26 to include this new knowledge of math for a more effecient solution (no brute forcing). In short the problem is to find the value of d ranging 1-1000 that would maximize the length of the repeating cycle in the expression "1/d".
Without making any further assumptions about the problem that could further improve the effecienty of solving the problem I decided to stick with
10^s=10^(s+t) (mod n)
which allows me for any value of D to find the longest repeating cycle (t) and the starting point for the cycle (s).
The problem is that eksponential part of the equation, since this will generate extremely large values before they're reduced by using modulus. No integral value can handle this large values, and the floating point data types seemes to be calculating wrong.
I'm using this code currently:
Private Function solveDiscreteLogarithm(ByVal D As Integer) As Integer
Dim NumberToIndex As New Dictionary(Of Long, Long)()
Dim maxCheck As Integer = 1000
For index As Integer = 1 To maxCheck
If (Not NumberToIndex.ContainsKey((10 ^ index) Mod D)) Then
NumberToIndex.Add((10 ^ index) Mod D, index)
Else
Return index - NumberToIndex((10 ^ index) Mod D)
End If
Next
Return -1
End Function
which at some point will compute "(10^47) mod 983" resulting in 783 which is not the correct result. The correct result should have been 732. I'm assuming it's because I'm using integral data types and it's causing overflow. I tried using double instead, but that gave even stranger results.
So what are my options?
Instead of using ^ to do your powers, I would do a for loop using multiplication and then taking the mod of the number as you go along by using a conditional to check if the number calculated is greater than the mod. This helps to keep the numbers smaller and within range of your mod number.
I'll give you a hint from my own solution to this.
With each decimal expansion of the fraction, you end up with a remainder, which if multiplied by the current decimal place, is an integer. Since this remainder is all you need to determine the next decimal expansion, you can use it to make predictions about the subsequent expansion.
See my post for this other question, getting the nth digit of a fraction, you may find some useful leads on what to try. (Methinks the answer is the largest prime less than 1000.) (Correction: the largest prime or Carmichael number less than 1000.)