Hyperopt set timeouts and modify space during execution - optimization

if someone can help on:
How to set a timeout for each individual test ? a timeout for the total experiment ?
How to setup a progressive strategy which would eliminate/prune a % of worst scoring branches of search space at different stage of the experiment (while using current optimization algorithms) ? ie. at 30% of the max total experiment, it could remove 50% of the worst scoring classifiers and all its branch of hyperparameters to remove it from upcoming tests. Then, same process at 60%...
Thanks a lot!

Following my exchange on hyperopt's github:
there is not a per-trial timeout but hyperopt-sklearn implements its own solution by just wrapping the function. Please look for "fn_with_timeout" at https://github.com/hyperopt/hyperopt-sklearn/ .
from issue 210: "the optimizers are stateless, and fmin stores all state of the experiment in the trials object. So if you remove some experiments from the trials object, it's as if they never happened. use fmin's "max_evals" parameter to interrupt search as often as you need to make these sorts of modifications. It should be fine to use repeated calls with e.g. max_evals increasing by 1 every time if you want really fine grained control."

Thanks for looking into this, #doxav. I've written some code that addresses question 1, taking part of fn_with_timeout from hyperopt-sklearn and adapting it for standard Hyperopt cost functions.
You can find it here:
https://gist.github.com/hunse/247d91d14aaa8f32b24533767353e35d

Related

Set the gap for Gurobi.Optimizer in Julia-JuMP

I am trying to understand how to set the gap for Gurobi.Optimizer, because it solves too long when the default gap threshold is used (10e-4).
I couldn't find it anywhere (they might be referred to as attributes or parameters).
PS: Also, I am trying to find the right tutorial or instructions on how to use the solver in JuMP. I checked here https://juliahub.com/docs/Gurobi/do9v6/0.7.7, but they don't reveal the meanings of different attributes and inputs. Please, send me one in case somebody knows.
Best,
A.
You can set the MIP gap via the MIPGap parameter:
using JuMP, Gurobi
model = Model(Gurobi.Optimizer)
set_optimizer_attribute(model, "MIPGap", 0.1)
You can read more about JuMP here: https://github.com/jump-dev/Gurobi.jl

Repast: check execution time for each method

My model is gradually slower down to an unacceptable speed(i.e. from 200 ticks per second to several seconds for one tick). I'd like to understand what the causes to this problem. What is a simplest way to check which part of the model is increasingly consuming the time? I tried used some other java profiler before but it's not good and difficault to understand.
A Java profiler like YourKit is the best way approach since it will provide the code "hots pots" in terms of the execution times for each class method. Alternatively, you can insert a few timing functions in parts of your model that you suspect contribute to most of the execution time, for example:
long start = System.nanoTime();
// some model code here
long end= System.nanoTime();
System.println("Step A time in seconds: " + (end - start)/1E9);

Setting initial values for non-linear parameters via tabuSearch

I'm trying to fit the lppl model to KLSE index to predict the most probable crash time. Many papers suggested tabuSearch to identify the initial value for non-linear parameters but none of them publish their code. I have tried to fit the mentioned index with the help of NLS And Log-Periodic Power Law (LPPL) in R. But the obtained error and p values are not significant. I believe that the initial values are not accurate. Can anyone help me on how to find the proper initial values?
library(tseries)
library(zoo)
ts<-get.hist.quote(instrument="^KLSE",start="2003-04-18",end="2008-01-30",quote="Close",provider="yahoo",origin="1970-01-01",compression="d",retclass="zoo")
df<-data.frame(ts)
df<-data.frame(Date=as.Date(rownames(df)),Y=df$Close)
df<-df[!is.na(df$Y),]
library(minpack.lm)
library(ggplot2)
df$days<-as.numeric(df$Date-df[1,]$Date)
f<-function(pars,xx){pars$a + (pars$tc - xx)^pars$m *(pars$b+ pars$c * cos(pars$omega*log(pars$tc - xx) + pars$phi))}
resids<-function(p,observed,xx){df$Y-f(p,xx)}
nls.out <- nls.lm(par=list(a=600,b=-266,tc=3000, m=.5,omega=7.8,phi=-4,c=-14),fn = resids, observed = df$Y, xx = df$days, control= nls.lm.control (maxiter =1024, ftol=1e-6, maxfev=1e6))
par<-nls.out$par
nls.final<-nls(Y~(a+(tc-days)^m*(b+c*cos(omega*log(tc-days)+phi))),data=df,start=par,algorithm="plinear",control=nls.control(maxiter=10024,minFactor=1e-8))
summary(nls.final)
I would look at some of the newer research on this topic, there is a good trig modification that will practically guarantee a singular optimization. Additionally, you can use r's built in linear equation solver, to find the linearizable parameters, ergo you will only need to optimize in 3 dimensions. The link below should get you started. I would cite recent literature and personal experience to strongly advise against a tabu search.
https://www.ethz.ch/content/dam/ethz/special-interest/mtec/chair-of-entrepreneurial-risks-dam/documents/dissertation/master%20thesis/MAS_final_Tuncay.pdf

Neon VLD consuming more cycles than what is expected?

I have a simple asm code which loads 12 quad registers of NEON, and have paralleled pairwise add instruction along with the load instruction ( to exploit the dual issue capability). I have verified the code here:
http://pulsar.webshaker.net/ccc/sample-d3a7fe78
As one can see, the code is taking around 13 cycles. But when I load the code on the board, the load instructions seems to take more than one cycle per load, I verified and found out that the VPADAL is taking 1 cycle as stated, but VLD1 is taking more than one cycle. Why is that?
I have taken care of the following:
The address is 16 byte aligned.
Have provided the alignment hint in the instruction vld1.64 {d0, d1} [r0,:128]!
Tried preload instruction pld [r0, #192], at places but that seems to add to the cycles instead of actually reducing the latency.
Can someone tell me what am I doing wrong, why this latency?
Other Details:
With reference to cortex-a8
arm-2009q1 cross compiler tool chain
coding in assembly
Your code is executing much slower than expected because as it's currently written, it's causing the perfect storm of pipeline stalls. On any modern CPU with a pipelined architecture, instructions can execute in one cycle under ideal conditions. The ideal conditions are that the instruction is not waiting for memory and doesn't have any register dependencies. The way you've written the code, you're not allowing for the delay in reading from memory and making the next instruction dependent on the results of the read. This is causing the worst possible performance. Also, I'm not sure why you're accumulating the pairwise adds into multiple registers. Try something like this:
veor.u16 q12,q12,q12 # clear accumulated sum
top_of_loop:
vld1.u16 {q0,q1},[r0,:128]!
vld1.u16 {q2,q3},[r0,:128]!
vpadal.u16 q12,q0
vpadal.u16 q12,q1
vpadal.u16 q12,q2
vpadal.u16 q12,q3
vld1.u16 {q0,q1},[r0,:128]!
vld1.u16 {q2,q3},[r0,:128]!
vpadal.u16 q12,q0
vpadal.u16 q12,q1
vpadal.u16 q12,q2
vpadal.u16 q12,q3
subs r1,r1,#8
bne top_of_loop
Experiment with different numbers of load instructions before executing the adds. The point is that you need to allow time for the read to occur before you can use the target register.
Note: Using Q4-Q7 is risky because they're non-volatile registers. On Android you will get random garbage appearing in these (especially Q4).

If passing a negative number to taskDelay function in vxworks, what happens?

Noted that the parameter of taskDelay is of type int, which means the number could be negative. Just wondering how the function is going to react when passing a negative number.
Most functions would validate the input, and just return early/return 0/set the parameter in question to a default value.
I presume there's no critical need to do this in production, and you probably have some code lying around that you could test with.... why not give it a go?
The documentation doesn't address it, and the only error codes they do define doesn't cover this case. The most correct answer therefore is that the results are undefined.
See the VxWorks / Tornado II FAQ for this gem, however:
taskDelay(-1) shows another bug in
the vxWorks timer/tick code. It has
the (side) effect of setting vxTicks
to zero. This corrupts the localtime
(and probably other things). In fact
taskDelay(x) will have the same effect
if vxTicks + x >= 0x100000000. If the
system clock rate is 100Hz this
happens after about 500 days (because
vxTicks wraps). At faster clock rates
it will happen sooner. Anyone trying
for several years uptime?
Oh there is an undocumented upper
limit on the clock rate. At rates
above 4294 select() will fail to
convert its 'usec' time into the
correct number of ticks. (From: David
Laight, dsl#tadpole.co.uk)
Assuming this bug is old, I would hope that it would either return an error or do the same thing as taskDelay(0), which puts your task at the end of the ready queue.
The task delay tick will be VIRTUALLY 10,9,..,1,0 for taskDelay(10).
The task delay tick will be VIRTUALLY -10,-11,...,-2147483648,2147483647,...,1,0 for taskDelay(-10).