How does the colorforth /mod algorithm work?

How does the colorforth /mod algorithm work? - division

I've been looking at Chuck Moore's colorForth recently, and I came upon this snippet of code (rendered in a traditional syntax):
: /mod for begin over over . + -if drop 2* [ swap ] next ; then over or or - 2* - next ;
With the following explanation:
Divide operation: trial subtract and shift in either 0 or 1
I'm really confused as to how this implements the full division operation. I realize the 2* shifts in a 0, the - 2* - shifts in a 1, and over or or implements a nip operation. I also understand the mixed loops and if combo.
Here's where I am falling short.
It seems to be expecting two items on the stack, the numerator and the denominator, which makes sense. However, the initial for pushes the TOS to the return stack, leaving only one item on the return stack. The over over operation works with two values present however, so I'm not sure what is happening.
He mentions subtraction, but there is no inversion happening, except for the - 2* - branch, which is already mentioned as shifting in a 1.
I'm not sure how you can construct the quotient bit by bit by only shifting in 1s or 0s (into the divisor?).
Some thoughts:
Maybe it depends on the particular word size of the chip Chuck was programming and the rollover after adding enough times
Maybe there is a preamble missing that inverts the denonminator, resulting in the subtraction that is mentioned on every loop.
Some idiosyncrasies between colorForth and other Forths:
. is a nop for timing purposes on Chuck's chips.
- is a bitwise inversion, rather than subtraction.
or is exclusive or instead of inclusive or
For additional information, Here's the source:
Description of function and use of colorForth opcodes

Just for reference: the excellent answer on this question was posted in comp.lang.forth by Ulrich Hoffmann.
Please edit this post to make it more detailed.

Related

Binary search start or end is target

Why is it that when I see example code for binary search there is never an if statement to check if the start of the array or end is the target?
import java.util.Arrays;
public class App {
public static int binary_search(int[] arr, int left, int right, int target) {
if (left > right) {
return -1;
}
int mid = (left + right) / 2;
if (target == arr[mid]) {
return mid;
}
if (target < arr[mid]) {
return binary_search(arr, left, mid - 1, target);
}
return binary_search(arr, mid + 1, right, target);
}
public static void main(String[] args) {
int[] arr = { 3, 2, 4, -1, 0, 1, 10, 20, 9, 7 };
Arrays.sort(arr);
for (int i = 0; i < arr.length; i++) {
System.out.println("Index: " + i + " value: " + arr[i]);
}
System.out.println(binary_search(arr, arr[0], arr.length - 1, -1));
}
}
in this example if the target was -1 or 20 the search would enter recursion. But it added an if statement to check if target is mid, so why not add two more statements also checking if its left or right?

EDIT:
As pointed out in the comments, I may have misinterpreted the initial question. The answer below assumes that OP meant having the start/end checks as part of each step of the recursion, as opposed to checking once before the recursion even starts.
Since I don't know for sure which interpretation was intended, I'm leaving this post here for now.
Original post:
You seem to be under the impression that "they added an extra check for mid, so surely they should also add an extra check for start and end".
The check "Is mid the target?" is in fact not a mere optimization they added. Recursively checking "mid" is the whole point of a binary search.
When you have a sorted array of elements, a binary search works like this:
Compare the middle element to the target
If the middle element is smaller, throw away the first half
If the middle element is larger, throw away the second half
Otherwise, we found it!
Repeat until we either find the target or there are no more elements.
The act of checking the middle is fundamental to determining which half of the array to continue searching through.
Now, let's say we also add a check for start and end. What does this gain us? Well, if at any point the target happens to be at the very start or end of a segment, we skip a few steps and end slightly sooner. Is this a likely event?
For small toy examples with a few elements, yeah, maybe.
For a massive real-world dataset with billions of entries? Hm, let's think about it. For the sake of simplicity, we assume that we know the target is in the array.
We start with the whole array. Is the first element the target? The odds of that is one in a billion. Pretty unlikely. Is the last element the target? The odds of that is also one in a billion. Pretty unlikely too. You've wasted two extra comparisons to speed up an extremely unlikely case.
We limit ourselves to, say, the first half. We do the same thing again. Is the first element the target? Probably not since the odds are one in half a billion.
...and so on.
The bigger the dataset, the more useless the start/end "optimization" becomes. In fact, in terms of (maximally optimized) comparisons, each step of the algorithm has three comparisons instead of the usual one. VERY roughly estimated, that suggests that the algorithm on average becomes three times slower.
Even for smaller datasets, it is of dubious use since it basically becomes a quasi-linear search instead of a binary search. Yes, the odds are higher, but on average, we can expect a larger amount of comparisons before we reach our target.
The whole point of a binary search is to reach the target with as few wasted comparisons as possible. Adding more unlikely-to-succeed comparisons is typically not the way to improve that.
Edit:
The implementation as posted by OP may also confuse the issue slightly. The implementation chooses to make two comparisons between target and mid. A more optimal implementation would instead make a single three-way comparison (i.e. determine ">", "=" or "<" as a single step instead of two separate ones). This is, for instance, how Java's compareTo or C++'s <=> normally works.

BambooleanLogic's answer is correct and comprehensive. I was curious about how much slower this 'optimization' made binary search, so I wrote a short script to test the change in how many comparisons are performed on average:
Given an array of integers 0, ... , N
do a binary search for every integer in the array,
and count the total number of array accesses made.
To be fair to the optimization, I made it so that after checking arr[left] against target, we increase left by 1, and similarly for right, so that every comparison is as useful as possible. You can try this yourself at Try it online
Results:
Binary search on size 10: Standard 29 Optimized 43 Ratio 1.4828
Binary search on size 100: Standard 580 Optimized 1180 Ratio 2.0345
Binary search on size 1000: Standard 8987 Optimized 21247 Ratio 2.3642
Binary search on size 10000: Standard 123631 Optimized 311205 Ratio 2.5172
Binary search on size 100000: Standard 1568946 Optimized 4108630 Ratio 2.6187
Binary search on size 1000000: Standard 18951445 Optimized 51068017 Ratio 2.6947
Binary search on size 10000000: Standard 223222809 Optimized 610154319 Ratio 2.7334
so the total comparisons does seem to tend to triple the standard number, implying the optimization becomes increasingly unhelpful for larger arrays. I'd be curious whether the limiting ratio is exactly 3.

To add some extra check for start and end along with the mid value is not impressive.
In any algorithm design the main concerned is moving around it's complexity either it is time complexity or space complexity. Most of the time the time complexity is taken as more important aspect.
To learn more about Binary Search Algorithm in different use case like -
If Array is not containing any repeated
If Array has repeated element in this case -
a) return leftmost index/value
b) return rightmost index/value
and many more point

Why is the condition in this if statement written as a multiplication instead of the value of the multiplication?

I was reviewing some code from a library for Arduino and saw the following if statement in the main loop:
draw_state++;
if ( draw_state >= 14*8 )
draw_state = 0;
draw_state is a uint8_t.
Why is 14*8 written here instead of 112? I initially thought this was done to save space, as 14 and 8 can both be represented by a single byte, but then so can 112.
I can't see why a compiler wouldn't optimize this to 112, since otherwise it would mean a multiplication has to be done every iteration instead of the lookup of a value. This looks to me like there is some form of memory and processing tradeoff.
Does anyone have a suggestion as to why this was done?
Note: I had a hard time coming up with a clear title, so suggestions are welcome.

Probably to explicitly show where the number 112 came from. For example, it could be number of bits in 14 bytes (but of course I don't know the context of the code, so I could be wrong). It would then be more obvious to humans where the value came from, than wiriting just 112.
And as you pointed out, the compiler will probably optimize it, so there will be no multiplication in the machine code.

Root finding with a kinked function using NLsolve in Julia

I am currently trying to solve a complementarity problem with a function that features a downward discontinuity, using the mcpsolve() function of the NLsolve package in Julia. The function is reproduced here for specific parameters, and the numbers below refer to the three panels of the figure.
Unfortunately, the algorithm does not always return the interior solution, even though it exists:
In (1), when starting at 0, the algorithm stays at 0, thinking that the boundary constraint binds,
In (2), when starting at 0, the algorithm stops right before the downward jump, even though the solution lies to the right of this point.
These problems occur regardless of the method used - trust region or Newton's method. Ideally, the algorithm would look for potential solutions in the entire set before stopping.
I was wondering if some of you had worked with similar functions, and had found a clever solution to bypass these issues. Note that
Starting to the right of the solution would not solve these problems, as they would also occur for other parametrization - see (3) this time,
It is not known a priori where in the parameter space the particular cases occur.
As an illustrative example, consider the following piece of code. Note that the function is smoother, and yet here as well the algorithm cannot find the solution.
function f!(x,fvec)
if x[1] <= 1.8
fvec[1] = 0.1 * (sin(3*x[1]) - 3)
else
fvec[1] = 0.1 * (x[1]^2 - 7)
end
end
NLsolve.mcpsolve(f!,[0.], [Inf], [0.], reformulation = :smooth, autodiff = true)
Once more, setting the initial value to something else than 0 would only postpone the problem. Also, as pointed out by Halirutan, fzero from the Roots package would probably work, but I'd rather use mcpsolve() as the problem is initially a complementarity problem.
Thank you very much in advance for your help.

Eigen: Computation and incrementation of only Upper part with selfAdjointView?

I do something like this to get:
bMRes += MatrixXd(n, n).setZero()
.selfadjointView<Eigen::Upper>().rankUpdate(bM);
This gets me an incrementation of bMRes by bM * bM.transpose() but twice as fast.
Note that bMRes and bM are of type Map<MatrixXd>.
To optimize things further, I would like to skip the copy (and incrementation) of the Lower part.
In other words, I would like to compute and write only the Upper part.
Again, in other words, I would like to have my result in the Upper part and 0's in the Lower part.
If it is not clear enough, feel free to ask questions.
Thanks in advance.
Florian

If your bMRes is self-adjoint originally, you could use the following code, which only updates the upper half of bMRes.
bMRes.selfadjointView<Eigen::Upper>().rankUpdate(bM);
If not, I think you have to accept that .selfadjointView<>() will always copy the other half when assigned to a MatrixXd.
Compared to A*A.transpose() or .rankUpdate(A), the cost of copying half of A can be ignored when A is reasonably large. So I guess you don't need to optimize your code further.
If you just want to evaluate the difference, you could use low-level BLAS APIs. A*A.transpose() is equivalent to gemm(), and .rankUpdate(A) is equivalent to syrk(), but syrk() don't copy the other half automatically.

J's # operator: why not reversed?

I've been studying J for the last few weeks, and something that has really buggered me is the dyadic case of the # operator: the only way I've used it yet is similar to the following:
(1 p: a) # a
If this were reversed, the parenthesis could be omitted:
a #~ 1 p: a
Why was there chosen not to take the reverse of the current arguments? Backwards familiarity with APL, or something I'm completely overlooking?

In general, J's primitives are designed to take "primary data" on the right, and "control data" on the left.
The distinction between "primary" and "control" data is not sharp, but in general one would expect "primary" data to vary more often than "control" data. That is, one would expect that the "control" data is less likely to be calculated than the "primary" data.
The reason for that design choice is exactly as you point out: because if the data that is more likely to be calculated (as opposed to fixed in advanced) appears on the right, then more J phrases could be expressed as simple trains or pipelines of verbs, without excessive parenthesization (given that J executed left-to-right).
Now, in the case of #, which data is more likely to be calculated? You're 100% right that the filter (or mask) is very likely to be calculated. However, the data to be filtered is almost certain to be calculated. Where did you get your a, for example?
QED.
PS: if your a can be calculated by some J verb, as in a=: ..., then your whole result, filter and all, can be expressed with primeAs =: 1&p: # ... .
PPS: Note the 1&p:, there. That's another example of "control" vs "primary": the 1 is control data - you can tell because it's forever bound to p: - and it's fixed. And so, uncoincidentally, p: was designed to take it as a left argument.
PPPS: This concept of "control data appears on the left" has been expressed many different ways. You can find one round-up of veteran Jers' explanations here: http://www.jsoftware.com/pipermail/general/2007-May/030079.html .

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas