SMO algorithm running into infinite loop? - optimization

I'm interest in building an SVM multi class classifier, so I am currently implementing
Sequential minimal optimization SMO.
My implementation is based on the pseudo code in
`Fast Training of Support Vector Machines using Sequential Minimal Optimization" by John C. Platt
I observed that for certain training examples. The Smo may diverge and run into an infinite loop
The following loop in the main routine
numChanged = 0;
examineAll = 1;
while (numChanged > 0 || examineAll >0) {…}
may run into an infinite loop.
Is there there clue or criterion to prevent the smo algorithm routine from running into an infinite loop?
I would like to thank you for your answer.
Regards

You can add a max iteration condition if you want:
while ((numChanged > 0 || examineAll) && iter < MaxIter)
but for most cases it shouldn't run into an infinite loop, this is the full Platt's pseudocode:
while (numChanged > 0 || examineAll)
{
numChanged = 0;
// Adding curly brackets for better readability
if (examineAll)
{
loop I over all training examples
numChanged += examineExample(I);
}
else
{
loop I over examples where alpha is not 0 & not C
numChanged += examineExample(I);
}
if (examineAll == 1)
{
examineAll = 0;
}
else
{
examineAll = 1;
}
}
Notice that what it is doing is performing an iteration to examine the example and the next one do the same just to those examples where alpha is not 0 or C. If nothing changes after the "examine all" iteration, the while loop condition will be false hence stopping the loop.
So, for that to be in a infinite loop there must be a corner case (probably a numerical error) that introduce oscillations making examples to change during the examine all phase but not changing in the "examine only alpha == 0 and C".
Usually if the data is normalized in [-1,1] or [0,1] and the parameters of the algorithm have reasonable values, those corner cases would be rare. In any case, if you want to be extra careful you can put the max-iter safety net.

Related

What is the time complexity of below function?

I was reading book about competitive programming and was encountered to problem where we have to count all possible paths in the n*n matrix.
Now the conditions are :
`
1. All cells must be visited for once (cells must not be unvisited or visited more than once)
2. Path should start from (1,1) and end at (n,n)
3. Possible moves are right, left, up, down from current cell
4. You cannot go out of the grid
Now this my code for the problem :
typedef long long ll;
ll path_count(ll n,vector<vector<bool>>& done,ll r,ll c){
ll count=0;
done[r][c] = true;
if(r==(n-1) && c==(n-1)){
for(ll i=0;i<n;i++){
for(ll j=0;j<n;j++) if(!done[i][j]) {
done[r][c]=false;
return 0;
}
}
count++;
}
else {
if((r+1)<n && !done[r+1][c]) count+=path_count(n,done,r+1,c);
if((r-1)>=0 && !done[r-1][c]) count+=path_count(n,done,r-1,c);
if((c+1)<n && !done[r][c+1]) count+=path_count(n,done,r,c+1);
if((c-1)>=0 && !done[r][c-1]) count+=path_count(n,done,r,c-1);
}
done[r][c] = false;
return count;
}
Here if we define recurrence relation then it can be like: T(n) = 4T(n-1)+n2
Is this recurrence relation true? I don't think so because if we use masters theorem then it would give us result as O(4n*n2) and I don't think it can be of this order.
The reason, why I am telling, is this because when I use it for 7*7 matrix it takes around 110.09 seconds and I don't think for n=7 O(4n*n2) should take that much time.
If we calculate it for n=7 the approx instructions can be 47*77 = 802816 ~ 106. For such amount of instruction it should not take that much time. So here I conclude that my recurrene relation is false.
This code generates output as 111712 for 7 and it is same as the book's output. So code is right.
So what is the correct time complexity??
No, the complexity is not O(4^n * n^2).
Consider the 4^n in your notation. This means, going to a depth of at most n - or 7 in your case, and having 4 choices at each level. But this is not the case. In the 8th, level you still have multiple choices where to go next. In fact, you are branching until you find the path, which is of depth n^2.
So, a non tight bound will give us O(4^(n^2) * n^2). This bound however is far from being tight, as it assumes you have 4 valid choices from each of your recursive calls. This is not the case.
I am not sure how much tighter it can be, but a first attempt will drop it to O(3^(n^2) * n^2), since you cannot go from the node you came from. This bound is still far from optimal.

How to avoid usize going negative?

I'm translating a chunk (2000 lines) of proprietary C code into Rust. In C, it is common to run a pointer, array index, etc. down, for as long as it is non-negative. In Rust, simplified to the bone, it would look something like:
while i >= 0 && more_conditions {
more_work;
i -= 1;
}
Of course, when i is usize, you get an under-overflow from subtraction. I have learned to work around this by using for loops with .rev(), offsetting my indexes by one, or using a different type and casting with as usize, etc.
Usually it works, and usually I can make it legible, but the code I'm modifying is chock-full of indexes running towards each other, and eventually tested with i_low > i_high
Something like (in Rust)
loop {
while condition1(i_low) { i_low += 1; }
while condition2(i_high) { j_high -= 1; }
if i_low > i_high { return something; }
do_something_else;
}
Every now and then this panics, as i_high runs past 0.
I have been inserting a lot of j_high >= 0 && in the code, and it become a lot less readable.
How do experienced Rust programmers avoid usize variables going to -1?
for loops? for i in (0..size).rev()
casting? i as usize, after checking for i < 0
offsetting your variable by one, and using i-1 when safe?
extra conditionals?
catching exceptions?
Or do you just eventually learn to write programs around these situations?
Clarification: The C code is not broken - it has been supposedly in production for ten years, structuring video segments on multiple servers 24/7. It is just not following Rust conventions - it often returns -1 as an index, it recurses with -1 for the low index of an array to process, and indexes go negative all the time. All of these are handled before problems occurs - ugly, but functional. Something like:
incident_segment = detect_incident(array, start, end);
attach(array, incident_segment);
store(array, start, incident_segment - 1);
process(array, incident_segment + 1, end);
In the above code, every single of the three resulting calls may be getting a segment index that's -1 (attach, store) or out of bounds (process) It's handled, but after the call.
My Rust code appears to be working as well. As a matter of fact, in order to deal with the negative usize, I added additional logic that pruned a number of recursions, so it runs about as fast as the C code (apparently faster, but that's also because I distributed the output on multiple drives)
The issue is that the client does not not want a full rewrite, and wants the 'native' programmers to be able to check the two programs against each other. Based on the answers so far, I'm thinking that using i64 and casting/shadowing as needed may be the best way to produce code that's easy to read for the 'natives'. Which I personally do not have to like...
If you want to do it idiomatically:
for j in (0..=i).rev() {
if conditions {
break;
}
//use j as your new i here
}
Note the use of ..=i here in the iterator, this means that it'll actually iterate including i: [0, 1, 2, ..., i-1, i], otherwise, you end up with [0, 1, 2, ..., i-2, i-1]
Otherwise, here is the code:
while (i as isize - 1) != -2 && more_conditions {
more_work;
i -= 1;
}
playground
I'd probably start by using saturating_sub (and _add for parallel structure):
while condition1(i_low) { i_low = i_low.saturating_add(1); }
while condition2(i_high) { j_high = j_high.saturating_sub(1); }
You need to be careful to ensure that your logic handles the value saturating at zero. You could also use more C-like semantics with wrapping_sub.
Truthfully, there's no one-size-fits-all solution. Many times, complicated logic becomes simpler if you abstract it a bit, or turn it slightly sideways. You haven't provided any concrete examples, so we cannot give any useful advice. I solve way too many problems with iterators, so that's often my first solution.
catching exceptions
Absolutely not. That's exceedingly inefficient and non-idiomatic.

Chisel, Generate Blocks and Large Intermediate/Output Files

Is there a construct in Chisel to generate Verilog generate blocks instead of unrolling Scala for-loops into very large (>100k line) output Verilog and FIRRTL files.
For example, I have the following code, that constructs a 2D lattice of MatrixElement modules and connects their inputs and outputs.
private val mat_elems = Seq.tabulate(rows, cols) { (i, j) => {
Module(new MatrixElement(n=i, m=j))
}}
for (i <- 0 until rows; j <- 0 until cols) {
// Wavefront propagation
if (i == 0 && j != 0) {
// First row
mat_elems(i)(j).io.in <> (false.B, false.B, mat_elems(i)(j - 1).io.out)
} else if (i != 0 && j == 0) {
// First col
mat_elems(i)(j).io.in <> (false.B, mat_elems(i - 1)(j).io.out, false.B)
} else if (i >= 1 && j >= 1) {
// Internal matrix
mat_elems(i)(j).io.in <> (mat_elems(i - 1)(j - 1).io.out, mat_elems(i - 1)(j).io.out,
mat_elems(i)(j - 1).io.out)
}
}
I am looking to compile this code for values of rows and cols >= 256. So this matrix gets very large in size.
If I were writing this as Verilog module, I would make use of generate blocks. However, in Chisel, since I am using the Scala loops, the entire lattice/matrix gets unrolled in the FIRRTL/Verilog outputs. Often producing >100k lines with all the _T* wires for 512x512 lattices. This causes a whole bunch of JVM out of memory errors in the Chisel compilation and makes the VCS simulation (just parsing the files takes forever) of the output files VERY slow.
Is there any way around this? Maybe get Chisel to generate Verilog generate blocks?
There is no support in Chisel nor FIRRTL for compressing this. Such a feature would probably be pretty useful but we have no plan or timeline for it. You can always use a blackbox and write the Verilog to do it yourself if you find the compilation time saved to be worth it.

branch prediction, and optimized code

I have following set of code blocks, the purpose of the both block is same.
I had to implement the 2nd block to avoid inverse logic and to increase the readability.
BTW, in the production code the condition is very complex.
The question is - I know branching is bad, how much penalty I have to pay.
Just as an extra info, please also consider, the probability of else branch is very high.
X = Get_XValue()
if (X != 5)
{
K = X+3;
.
.
}
X = Get_XValue()
if (X == 5)
{
/*do nothing*/
}
else
{
K = X+3;
.
.
}
This all comes down to your compiler. A good optimizing compiler will detect that the then-clause in the second example is empty and reverse the test. Thus it will generate the same code for both cases so there should be no penalty at all.
As a side note, I can add that this was the case for all three compilers I tried (clang, gcc, and iccarm),

Faster calculation for large amounts of data / inner loop

So, I am programming a simple Mandelbrot renderer.
My inner loop (which is executed up to ~100,000,000 times each time I draw on screen) looks like this:
Complex position = {re,im};
Complex z = {0.0, 0.0};
uint32_t it = 0;
for (; it < maxIterations; it++)
{
//Square z
double old_re = z.re;
z.re = z.re*z.re - z.im*z.im;
z.im = 2*old_re*z.im;
//Add c
z.re = z.re+position.re;
z.im = z.im+position.im;
//Exit condition (mod(z) > 5)
if (sqrt(z.re*z.re + z.im*z.im) > 5.0f)
break;
}
//Color in the pixel according to value of 'it'
Just some very simple calculations. This takes between 0.5 and a couple of seconds, depending on the zoom and so on, but i need it to be much faster, to enable (almost) smooth scrolling.
My question is: What is my best bet to achieve the maximum possible calculation speed?
OpenCl to use the GPU? Coding it in assembly? Dividing the image into small pieces and dispatch the calculation of each piece on another thread? A combination of those?
Any help is appreciated!
I have written a Mandelbrot set renderer several times... and here are the things that you should keep in mind...
The things that take the longest are the ones that never escape and take all the iterations.
a. so you can make a region in the middle out of a few rectangles and check that first.
any starting point with a real and imaginary part between -1 and 1 will never escape.
you can cache points (20, or 30) in a rolling buffer and if you ever see a point in the buffer that you just calculated means that you have a cycle and it will never escape.
You can use a more general logic that doesn't require a square root... in that if any part is less than -2 or more than 2 it will race out of control and can be considered escaped.
But you can also break this up because each point is its own thing, so you can make a separate thread or gcd dispatch or whatever for each row or quadrant... it is a very easy problem to divide up and run in parallel.
In addition to the comments by #Grady Player you could start just by optimising your code
//Add c
z.re += position.re;
z.im += position.im;
//Exit condition (mod(z) > 5)
if (z.re*z.re + z.im*z.im > 25.0f)
break;
The compiler may optimise the first, but the second will certainly help.
Why are you coding your own complex rather than using complex.h