Inserting into a BST - binary-search-tree

I want to create a visualization of a BST, but every example I find online stops after inserting only 7 or less values. Let's say I'm doing the following sequence:
insert(5),insert(7),insert(9),insert(8),insert(3),insert(2),insert(4),insert(6),insert(10).
Up until insert(6), I end up with:
My main question is: where do I go from here? Do I add on to my left-most leaf or do I add on to my "lowest" leaf?
Also: according to wikipedia the code for an insertion is:
void insert(Node* node, int value) {
if (value < node->key) {
if (node->leftChild == NULL)
node->leftChild = new Node(value);
else
insert(node->leftChild, value);
} else {
if(node->rightChild == NULL)
node->rightChild = new Node(value);
else
insert(node->rightChild, value);
}
}
But according to this, once I'm at 8 and I get insert(3), it would add 3 to the left of 8, as it would compare 3 with the node 9, see that the less-than spot is already taken by 8, then rerun the insertion with 8 being the node compared to, and place the 3 as the left child of 8. But this would just create kind of a list.
Thanks.

What seems to mislead you is that on every insert, you must start from the root (which, in your case, is 5). So, let's take the graph above and try to insert 3 following the algorithm you pasted:
3 < 5, we go left and we meet 3
3 == 3, we go right (here it's the same, the code above says "go right") and we meet 4
3 < 4, we go left. Since 4 has no left child, 3 becomes its left child.
Try to build your tree from zero using the algorithm above. Incidentally, you won't find many examples with n > 10 nodes, because they tend to be exceedingly long without any benefit for the reader.

Related

How do I return the sum of array with exceptions by using a while loop in a function?

Python beginner here.
I already have the solution to the question but I'm not understanding why the "add" variable in the solution plays a role of creating exceptions to remove numbers between 6 and 9. I already tried Python Tutor but still not understanding. Many thanks in advance!
QUESTION: Return the sum of the numbers in the array, except ignore sections of numbers starting with a 6 and extending to the next 9 (every 6 will be followed by at least one 9). Return 0 for no numbers.
Sample Solution code
def summer_69(arr):
total = 0
add = True
for num in arr:
while add:
if num != 6:
total += num
break
else:
add = False
while not add:
if num != 9:
break
else:
add = True
break
return total
Sample answers:
summer_69([1, 3, 5]) --> 9
summer_69([4, 5, 6, 7, 8, 9]) --> 9
summer_69([2, 1, 6, 9, 11]) --> 14
You can think of the variable "add" as a flag. I think that might be a better name for this variable in this instance.
It is only being used to tell if you have run into a 6 within the sequence of numbers in the array, then once it has been set it goes through an arbitrary amount of numbers in the array until it gets a 9 and then it resets the flag.
It may help to rename the variable "add" as "flag". Have your new variable "flag" default to False and then if you run into a 6 set "flag" to true. Once the flag is on do not add any trailing numbers in the sequence until you run into the number 9 then reset to false.
Perhaps that will help the readability. Naming variables is the hardest part of programming.

Discrete Binary Search Main Theory

I have read this: https://www.topcoder.com/community/competitive-programming/tutorials/binary-search.
I can't understand some parts==>
What we can call the main theorem states that binary search can be
used if and only if for all x in S, p(x) implies p(y) for all y > x.
This property is what we use when we discard the second half of the
search space. It is equivalent to saying that ¬p(x) implies ¬p(y) for
all y < x (the symbol ¬ denotes the logical not operator), which is
what we use when we discard the first half of the search space.
But I think this condition does not hold when we want to find an element(checking for equality only) in an array and this condition only holds when we're trying to find Inequality for example when we're searching for an element greater or equal to our target value.
Example: We are finding 5 in this array.
indexes=0 1 2 3 4 5 6 7 8
1 3 4 4 5 6 7 8 9
we define p(x)=>
if(a[x]==5) return true else return false
step one=>middle index = 8+1/2 = 9/2 = 4 ==> a[4]=5
and p(x) is correct for this and from the main theory, the result is that
p(x+1) ........ p(n) is true but its not.
So what is the problem?
We CAN use that theorem when looking for an exact value, because we
only use it when discarding one half. If we are looking for say 5,
and we find say 6 in the middle, the we can discard the upper half,
because we now know (due to the theorem) that all items in there are > 5
Also notice, that if we have a sorted sequence, and want to find any element
that satisfies an inequality, looking at the end elements is enough.

Processing loading table data

I have a text file "celldata.txt" containing a very simple table of data.
1 2 3 4
5 6 7 8
9 10 11 12
1 2 3 4
2 3 4 5
The problem is when it comes to accessing the data at a certain column and row.
My approach has been to load using loadTable.
Table table;
int numCols;
int numRows;
void setup() {
size(200,200);
table = loadTable("celldata.txt","tsv");
numRows=table.getRowCount();
numCols=table.getColumnCount();
}
void draw() {
background(255);
fill(0);
text(numRows +" "+ numCols,100,100); // Check num of cols and rows
println(table.getFloat(0,0));
}
Question 1: When I do this, it says the number of rows are 5 and the number of columns is just 1. Why is it not 5 x 4?
Question 2: Why is table.getFloat(0,0) "NaN" instead of the first element of the data?
I want to use a much bigger matrix later and access certain elements (of type double) with something like getFloat(i,j) and be able to loop through all elements.
Using the same example data as I, can someone please help me understand what is wrong with my code and how to access the textfile's data? Should I be using another method than loadTable?
You've told Processing that the file contains tab separated values (by using the "tsv" option), but your file contains space separated values.
Since your file does not contain any tabs, it reads the entire row as a single value. So the 0,0 position of your table is 1 2 3 4, which isn't a number- hence the NaN. This is also why it thinks your table only has one column.
You should modify your celldata.txt file to actually be separated by tabs instead of spaces:
1 2 3 4
5 6 7 8
9 10 11 12
1 2 3 4
2 3 4 5
You could also separate them by commas and then use the "csv" option.
If you're still having trouble, you can see what Processing is reading in by adding saveTable(table, "data/new.csv"); to the end of your setup() function and then looking at that file. It will be a list of values separated by commas, so you can see exactly where Processing thinks the cells of the table are.

Dataframe non-null values differ from value_counts() values

There is an inconsistency with dataframes that I cant explain. In the following, I'm not looking for a workaround (already found one) but an explanation of what is going on under the hood and how it explains the output.
One of my colleagues which I talked into using python and pandas, has a dataframe "data" with 12,000 rows.
"data" has a column "length" that contains numbers from 0 to 20. she wants to divided the dateframe into groups by length range: 0 to 9 in group 1, 9 to 14 in group 2, 15 and more in group 3. her solution was to add another column, "group", and fill it with the appropriate values. she wrote the following code:
data['group'] = np.nan
mask = data['length'] < 10;
data['group'][mask] = 1;
mask2 = (data['length'] > 9) & (data['phraseLength'] < 15);
data['group'][mask2] = 2;
mask3 = data['length'] > 14;
data['group'][mask3] = 3;
This code is not good, of course. the reason it is not good is because you dont know in run time whether data['group'][mask3], for example, will be a view and thus actually change the dataframe, or it will be a copy and thus the dataframe would remain unchanged. It took me quit sometime to explain it to her, since she argued correctly that she is doing an assignment, not a selection, so the operation should always return a view.
But that was not the strange part. the part the even I couldn't understand is this:
After performing this set of operation, we verified that the assignment took place in two different ways:
By typing data in the console and examining the dataframe summary. It told us we had a few thousand of null values. The number of null values was the same as the size of mask3 so we assumed the last assignment was made on a copy and not on a view.
By typing data.group.value_counts(). That returned 3 values: 1,2 and 3 (surprise) we then typed data.group.value_counts.sum() and it summed up to 12,000!
So by method 2, the group column contained no null values and all the values we wanted it to have. But by method 1 - it didnt!
Can anyone explain this?
see docs here.
You dont' want to set values this way for exactly the reason you pointed; since you don't know if its a view, you don't know that you are actually changing the data. 0.13 will raise/warn that you are attempting to do this, but easiest/best to just access like:
data.loc[mask3,'group'] = 3
which will guarantee you inplace setitem

What is Lazy Binary Search?

I don't know whether the term "Lazy" Binary Search is valid, but I was going through some old materials and I just wanted to know if anyone can explain the algorithm of a Lazy Binary Search and compare it to a non-lazy Binary Search.
Let's say, we have this array of numbers:
2, 11, 13, 21, 44, 50, 69, 88
How to look for the number 11 using a Lazy Binary Search?
Justin was wrong.
The common binarySearch algorithm first checks whether the target is equal to current middle entry before proceeding to the left or right halves if required. Lazy binarySearch algorithm postpones the equality check until the very end.
algorithm lazyBinarySearch(array, n, target)
left<-0
right<-n-1
while (left<right) do
mid<-(left+right)/2
if(target>array[mid])then
left<-mid+1
else
right<-mid
endif
endwhile
if(target==array[left])then
display "found at position", left
else
display "not found"
endif
In your case, in an array,
2 11 13 21 44 50 69 88
and you want to search for 11
First we do a trace of common binary search,
index 0 1 2 3 4 5 6 7
2 11 13 21 44 50 69 88
left mid right
First while loop:
left <= right, we enter the first while loop. We calculated the mid index by (0+7)/2=3.5=3 by integer division, mid = 3. straight away we check if target 11 is equal to the mid index entry, 11 != 21, then we decide whether to go left or right, we finds out 11 < 21, should go left. left index remains unchanged, right index becomes mid index -1, right = 3 - 1 = 2. Done this step.
Second while loop:
left <= right, 0 <= 2, we enter the seond while loop. Mid index is recalcuated: (0+2)/2=1, mid = 1. At once we do the equality check, target 11 is the same as the index 1 entry, 11 == 11. We found this entry, leaving behind all the left right mid indexes (don't care) and breaks out the while loop, return index 1.
Now we trace this search by lazy binazySearch algorithm, initial array with left/right indexes set up the same as previous.
First while loop:
left < right, we enter the first while loop. Mid index is calculated as the same above = 3. Instead of doing an equality check in common binarySearch we do a comparison with the mid index entry this time. And the comparison only checks if our target 11 is greater than the mid index entry, leaving whether they equal or not to the very end outside the while loop. So we find out 11 < 21, right index is reset to the mid index, right = 3. Done this step.
Second while loop:
left < right, 0 < 3, we enter the second while loop. mid index is recalculated as mid = (0+3)/2 = 1 by integer division. Again we do a comparison with mid index entry 11, we realise it's not greater than mid index entry. We fall into the else part of the while loop and reset the right index to be mid index, right = 1. Done this step.
Third while loop:
left < right, 0 < 1, this time we have to re-enter the while loop again since it still satisfies the while condition. Mid index becomes (0+1)/2=0, mid = 0. After comparing target 11 with mid index entry 2 we found out it's greater than it (11 > 2), left index is reset to mid + 1, left = 0 + 1 = 1. Done this step.
With left index = 1 and right index = 1, left = right, while loop condition is no longer satisfied, so there's no need to re-enter. We fall into the if part down below. Target 11 equals left index entry 11, we found the target and returns left index 1.
As you can see, lazy binarySearch does one more while loop step to finally realise the index is actually 1. And this is how the word "postpones the equality check" means in the definition I mentioned in the very beginning. Clearly the lazy binarySearch algorithm does more things than common binarySearch before reaching the program termination. And the term "lazy" is reflected in the time of when to check the target equals the mid index entry.
However lazy binarySearch is more preferable to use under some other circumstances, but it's not in the context of this case.
(The reasoning part of the algorithm is done by me, anyone wishes to copy please do credit)
source: "Data Structures Outside In With Java" by Sesh Venugopal, Prentice Hall
As far as I am aware "Lazy binary search" is just another name for "Binary search".