Discrete Binary Search in Topcoder Tutorial - binary-search

I went through this code given in Topcoder binary search tutorial.
binary_search(lo, hi, p):
while lo < hi:
mid = lo + (hi-lo)/2
if p(mid) == true:
hi = mid
else:
lo = mid+1
if p(lo) == false:
complain // p(x) is false for all x in S!
return lo // lo is the least x for which p(x) is true
I am not able to reason as to why always low is going to point to what we want i.e. lo is the least x for which p(x) is true ?
I have tried this on examples and this comes out to be true but I am not able to think about it logically.
Some sort of proof using some invariant which is maintained in the loop will be helpful .
Thanks.

My understanding is more of a logical one:
If the while loop has to terminate then lo should become equal to or greater than hi. Assuming we work with integers, then lo will be equal to hi and hi will be mid.

In your language, the loop invariant is, lo is the least x in the search space that p(x) can be true. The loop makes progress by cutting the search space in half while maintaining the invariant. When the loop exits, search space has only one value, lo (which equals hi). We perform the last check, if p(lo) equals false, then we didn't find any x that makes p(x) true. Otherwise, it's the least x we look for.

Related

Gnuplot summation issue

I am trying to make a plot of a simple mutation accumulation process on a binary tree...
My technical problem in gnuplot, is that is that I want to plot the possibility of getting 2 mutations on a specific lineage on the graph, here is the equation which determines it:
P_{2 mutation} = sum[k=0:n] (m/(2**(k+1)/(1-(1/2)**k)))(1-exp(-muk))
(dont bother with the formula im not sure that this is the correct one yet :))
where n is the number of levels of the binary tree, mu is the mutation rate, and m is the number of previously randomly thrown mutations onto the graphs edges...
I want to make a plot which is this possibility depending on the levels of the binary tree...
Therefore I wrote a script which is something like this:
set term pngcairo size 800,600
set title "Két mutáció megjelenésének valószínűsége, egy n szintű bináris fa egyik sejtvonalában"
set xlabel"szintek száma (n)"
set ylabel"Két mutáció megjelenésének valószínűsége (P_{2^{lin})"
set xrange[1:10]
set yrange[0:1]
set output '2mutvalsz.png'
set multiplot
do for[i=1:10]{
mu = 0.1+(i*0.1)
m = 4
f(x)=(x/((2**(x+1))*(1-(0.5)**x)))
if(m<floor(f(x)))
{
p(x)=sum [k=0:floor(x)](m*(1/((2**(x+1))*(1-(0.5)**x))))*(1-exp(-mu*k))
}
else
{
p(x)=1
}
plot p(x) lt i lw 1
}
unset multiplot
set output
So my problem is, that I dont know if it is correct to do what I do in the
if statement...
What I want is to behold the statement m< f(x) where f(x) is the number of edges in respect of n, which is an integer value therefore I use floor(f(x)), and sum through the x values (which are the number of levels what has to be an integer too... so floor(x), like a heavyside function to make the x axis discrete) in the sum...
And also I get an error message:
gnuplot> load '2mutvalsz.plt'
line 27: undefined variable: x
where line 27 is the end of the do for loop...
So my question is that is it a correct way to make a summation integer the x values and of course why I get the error message...
Thank you, and I hope everything is clear...
The error message is produced because the if statement in your script is interpreted when Gnuplot loads the script - it tries to evaluate the condition of the if statement and since the variable x is not defined, it produces the mentioned message.
You could put everything together using the ternary operator as:
p(x)=( m<floor(f(x)) )?( sum [k=0:floor(x)](m*(1/((2**(x+1))*(1-(0.5)**x))))*(1-exp(-mu*k)) ):1;
However, since the function f(x) is on the imposed x-range of [0,1] less than 1, the condition m<floor(f(x)) will be always false.

Topcoder Binary search Tutorial

What we can call the main theorem states that binary search can be used if and only if for all x in S, p(x) implies p(y) for all y > x. This property is what we use when we discard the second half of the search space. It is equivalent to saying that ¬p(x) implies ¬p(y) for all y < x (the symbol ¬ denotes the logical not operator), which is what we use when we discard the first half of the search space.
Please explain this paragraph in simpler and detailed terms.
Consider that p(x) is some property of x. When using binary search this property is usually x being either greater, lesser, or equal than some other value k that you are looking for.
What we can call the main theorem states that binary search can be used if and only if for all x in S, p(x) implies p(y) for all y > x.
Lets say that x is some value in the middle of the list and you are looking for where k is. Lets also say that p(x) means that k is greater than x. If the list is sorted in ascending order, than all values y to the right of x (y > x) must also be greater than k (the property is transitive), and as such p(y) also holds for all y. This is the basis of binary search. If you are looking for k and some value x is known to be greater than k, than all elements to its right are also greater than k. Notice that this is only true if the list is sorted. Consider the list [a,b,c] and a value k that you are looking for. If it's known that a < b and b < c, if k < b is true, than k < c must also be true.
This property is what we use when we discard the second half of the search space.
This is what the previous conclusion allows you to do. As you know that the property that holds for x also holds for all y (that is, they are not the element you are looking for, because they are greater) than it's safe to discard them, and as such you keep looking for k only on the lower half.
The rest of the paragraph says pretty much the same thing for discarding the lower half.
In short, p(x) is some transitive property that should hold to all values to the right of any given value x (again, because it's transitive). ¬p(x), on the other hand, should hold for all values to the left of x. By being able to conclude that those are not the elements you are looking for, you can conclude that it's safe to discard either half of the list.

Shorten if condition x > 3 || y > 3 || z > 3

Is there a betther way to do this condition:
if ( x > 3 || y > 3 || z > 3 ) {
...
}
I was thinking of some bitwise operation but could'nt find anything.
I searched over Google but it is hard to find something related to that kind of basic question.
Thanks!
Edit
I was thinkg in general programming. Would it be different according to a specific language? Such as C/C++, Java...
What you have is good. Assuming C/C++ or Java:
Intent is clear
Short circuit optimisation means the expression will be true as soon as any one part is true.
Looking at that 2nd point- if any of x, y or z is more likely to be >3, then put them to the left of the expression so they are evaluated first which means the others may not need to be evaluated at all.
For argument's sake, if you must have a bitwise check x|y|z > 3 works, but it normally won't be reduced, so it's (probably*) always 2 bitwise ops and a compare, where the other way could be as fast as 1 compare.
(* This is where the language lawyers arrive an add comments why this edit is wrong and the bitwise version can be optimised;-)
There was a comment here (now deleted) along the lines of "new programmer shouldn't worry about this level of optimisation" - and it was 100% correct. Write easy to follow, working code and THEN try to squeeze performance out of it AFTER you know it is "too slow".

howmany() Macro Objective C

While using Xcode, I accidentally auto completed to the macro howmany(x,y) and traced it to types.h. The entire line reads as follows:
#define howmany(x, y) __DARWIN_howmany(x, y) /* # y's == x bits? */
This didn't really make much sense, so I followed the path a little more and found __DARWIN_howmany(x, y) in _fd_def.h. The entire line reads as follows:
#define __DARWIN_howmany(x, y) ((((x) % (y)) == 0) ? ((x) / (y)) : (((x) / (y)) + 1)) /* # y's == x bits? */
I have no idea what __DARWIN_howmany(x, y) does. Does the comment at the end of the line shed any light on the intended function of this macro? Could someone please explain what this macro does, how it is used, and its relevance in _fd_def.h
This is a fairly commonly used macro to help programmers quickly answer the question, if I have some number of things, and my containers can each hold y of them, how many containers do I need to hold x things?
So if your containers can hold five things each, and you have 18 of them:
n = howmany(18, 5);
will tell you that you will need four containers. Or, if my buffers are allocated in words, but I need to put n characters into them, and words are 8 characters long, then:
n = howmanu(n, 8);
returns the number of words needed. This sort of computation is ubiquitous in buffer allocation code.
This is frequently computed:
#define howmany(x, y) (((x)+(y)-1)/(y))
Also related is roundup(x, y), which rounds x up to the next multiple of y:
#define roundup(x, y) (howmany(x, y)*(y))
Based on what you've posted, the macro seems to be intended to answer a question like, "How many chars does it take to hold 18 bits?" That question could be answered with this line of code
int count = howmany( 18, CHAR_BIT );
which will set count to 3.
The macro works by first checking if y divides evenly into x. If so, it returns x/y, otherwise it divides x by y and rounds up.

get output as a vector in R during a loop

How can I get the output as a vector in R?
For example, if I want to have
for (i in 1:1000) {if i mod 123345 = 0, a = list(i)}
a
but I would want to find all i that divide evenly into 123345 (i.e., factors), and not just the largest one.
There may be a more concise way to do this, but I would do it this way:
i <- 1:1000
j <- i[12345 %% i == 0 ]
The resulting vector j contains a vector of the values in i which are factors of 12345. In R the modulo operator is %% and it's a bit of a bitch to find when searching on your own. It's buried in the help document for arithmetic operators and you can find it by searching for + which must be in quotes like: ?"+" and then you have to read down a bit.
You better add a VBA tag if you want to find a VBA answer. But I suspect it will involve the VBA modulo operator ;)
JD Long's method is really the first that came to mind, but another:
Filter(function(x) !(12345 %% x), 1:1000)
I think it's kind of fun to avoid any need for an explicit assignment. (Kind of too bad to create a new function each time.) (In this case, "!" converts a non-zero value to FALSE and zero to TRUE. "Filter" picks out each element evaluating to TRUE.)
Also avoiding the need for a separate allocation and not creating a new function:
which(!(12345 %% 1:1000))
Timing:
> y <- 1:1000
> system.time(replicate(1e5, y[12345 %% y == 0 ]))
user system elapsed
8.486 0.058 8.589
> system.time(replicate(1e5, Filter(function(x) !(12345 %% x), y)))
Timing stopped at: 90.691 0.798 96.118 # I got impatient and killed it
# Even pulling the definition of the predicate outside,
# it's still too slow for me want to wait for it to finish.
# I'm surprised Filter is so slow.
> system.time(replicate(1e5, which(!12345 %% y)))
user system elapsed
11.618 0.095 11.792
So, looks like JD Long's method is the winner.
You wrote:
for (i in 1:1000) {if i mod 123345 = 0, a = list(i)} a
JD Long's code is much better, but if you wanted this loopy strategy to work try instead:
a <- vector(mode="list"); for (i in 1:1000) {if (123345 %% i == 0){ a <-c(a,i) } }
as.vector(unlist(a))