Why does Optaplanner not allow configuring the decay rate in simulated annealing? - optaplanner

I want to use Simulated Annealing in OptaPlanner, but I am a little baffled by the fact that there is only a setting for the initial temperature and not one for the decay rate. What is the reason for this choice?

The cooldown rate is automatically derived from the timeGradient, which is simply put 0.0 at the start, 0.5 at half the spentTime and 1.0 at all of the spentTime.
But yes, the classic Simulated Annealing method has 2 parameters (starting temperature and cooldown rate). One could implement such an SA pretty easily by copy-pasting the SimulatedAnnealingAcceptor and configuring it in the AcceptorConfig.
That being said, tuning 2 parameters is a pain for users. That's why OptaPlanner default SA only has 1 parameter that - together with the termination - is translated into the 2 parameters that SA needs.

Related

SLSQP in ScipyOptimizeDriver only executes one iteration, takes a very long time, then exits

I'm trying to use SLSQP to optimise the angle of attack of an aerofoil to place the stagnation point in a desired location. This is purely as a test case to check that my method for calculating the partials for the stagnation position is valid.
When run with COBYLA, the optimisation converges to the correct alpha (6.04144912) after 47 iterations. When run with SLSQP, it completes one iteration, then hangs for a very long time (10, 20 minutes or more, I didn't time it exactly), and exits with an incorrect value. The output is:
Driver debug print for iter coord: rank0:ScipyOptimize_SLSQP|0
--------------------------------------------------------------
Design Vars
{'alpha': array([0.5])}
Nonlinear constraints
None
Linear constraints
None
Objectives
{'obj_cmp.obj': array([0.00023868])}
Driver debug print for iter coord: rank0:ScipyOptimize_SLSQP|1
--------------------------------------------------------------
Design Vars
{'alpha': array([0.5])}
Nonlinear constraints
None
Linear constraints
None
Objectives
{'obj_cmp.obj': array([0.00023868])}
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.0002386835700364719
Iterations: 1
Function evaluations: 1
Gradient evaluations: 1
Optimization Complete
-----------------------------------
Finished optimisation
Why might SLSQP be misbehaving like this? As far as I can tell, there are no incorrect analytical derivatives when I look at check_partials().
The code is quite long, so I put it on Pastebin here:
core: https://pastebin.com/fKJpnWHp
inviscid: https://pastebin.com/7Cmac5GF
aerofoil coordinates (NACA64-012): https://pastebin.com/UZHXEsr6
You asked two questions whos answers ended up being unrelated to eachother:
Why is the model so slow when you use SLSQP, but fast when you use COBYLA
Why does SLSQP stop after one iteration?
1) Why is SLSQP so slow?
COBYLA is a gradient free method. SLSQP uses gradients. So the solid bet was that slow down happened when SLSQP asked for the derivatives (which COBYLA never did).
Thats where I went to look first. Computing derivatives happens in two steps: a) compute partials for each component and b) solve a linear system with those partials to compute totals. The slow down has to be in one of those two steps.
Since you can run check_partials without too much trouble, step (a) is not likely to be the culprit. So that means step (b) is probably where we need to speed things up.
I ran the summary utility (openmdao summary core.py) on your model and saw this:
============== Problem Summary ============
Groups: 9
Components: 36
Max tree depth: 4
Design variables: 1 Total size: 1
Nonlinear Constraints: 0 Total size: 0
equality: 0 0
inequality: 0 0
Linear Constraints: 0 Total size: 0
equality: 0 0
inequality: 0 0
Objectives: 1 Total size: 1
Input variables: 87 Total size: 1661820
Output variables: 44 Total size: 1169614
Total connections: 87 Total transfer data size: 1661820
Then I generated an N2 of your model and saw this:
So we have an output vector that is 1169614 elements long, which means your linear system is a matrix that is about 1e6x1e6. Thats pretty big, and you are using a DirectSolver to try and compute/store a factorization of it. Thats the source of the slow down. Using DirectSolvers is great for smaller models (rule of thumb, is that the output vector should be less than 10000 elements). For larger ones you need to be more careful and use more advanced linear solvers.
In your case we can see from the N2 that there is no coupling anywhere in your model (nothing in the lower triangle of the N2). Purely feed-forward models like this can use a much simpler and faster LinearRunOnce solver (which is the default if you don't set anything else). So I turned off all DirectSolvers in your model, and the derivatives became effectively instant. Make your N2 look like this instead:
The choice of best linear solver is extremely model dependent. One factor to consider is computational cost, another is numerical robustness. This issue is covered in some detail in Section 5.3 of the OpenMDAO paper, and I won't cover everything here. But very briefly here is a summary of the key considerations.
When just starting out with OpenMDAO, using DirectSolver is both the simplest and usually the fastest option. It is simple because it does not require consideration of your model structure, and it's fast because for small models OpenMDAO can assemble the Jacobian into a dense or sparse matrix and provide that for direct factorization. However, for larger models (or models with very large vectors of outputs), the cost of computing the factorization is prohibitively high. In this case, you need to break the solver structure down more intentionally, and use other linear solvers (sometimes in conjunction with the direct solver--- see Section 5.3 of OpenMDAO paper, and this OpenMDAO doc).
You stated that you wanted to use the DirectSolver to take advantage of the sparse Jacobian storage. That was a good instinct, but the way OpenMDAO is structured this is not a problem either way. We are pretty far down in the weeds now, but since you asked I'll give a short summary explanation. As of OpenMDAO 3.7, only the DirectSolver requires an assembled Jacobian at all (and in fact, it is the linear solver itself that determines this for whatever system it is attached to). All other LinearSolvers work with a DictionaryJacobian (which stores each sub-jac keyed to the [of-var, wrt-var] pair). Each sub-jac can be stored as dense or sparse (depending on how you declared that particular partial derivative). The dictionary Jacobian is effectively a form of a sparse-matrix, though not a traditional one. The key takeaway here is that if you use the LinearRunOnce (or any other solver), then you are getting a memory efficient data storage regardless. It is only the DirectSolver that changes over to a more traditional assembly of an actual matrix object.
Regarding the issue of memory allocation. I borrowed this image from the openmdao docs
2) Why does SLSQP stop after one iteration?
Gradient based optimizations are very sensitive to scaling. I ploted your objective function inside your allowed design space and got this:
So we can see that the minimum is at about 6 degrees, but the objective values are TINY (about 1e-4).
As a general rule of thumb, getting your objective to around order of magnitude 1 is a good idea (we have a scaling report feature that helps with this). I added a reference that was about the order of magnitude of your objective:
p.model.add_objective('obj', ref=1e-4)
Then I got a good result:
Optimization terminated successfully (Exit mode 0)
Current function value: [3.02197589e-11]
Iterations: 7
Function evaluations: 9
Gradient evaluations: 7
Optimization Complete
-----------------------------------
Finished optimization
alpha = [6.04143334]
time: 2.1188600063323975 seconds
Unfortunately, scaling is just hard with gradient based optimization. Starting by scaling your objective/constraints to order-1 is a decent rule of thumb, but its common that you need to adjust things beyond that for more complex problems.

GNURadio Companion Blocks for Z-Wave using RTL-SDR dongle

I'm using RTL-SDR generic dongle for receiving frames of Z-Wave protocol. I use real Z-Wave devices. I'm using scapy-radio and I've also downloaded EZ-Wave. However, none of them implements blocks for all Z-Wave data rates, modulations and codings. I've received some frames using original solution of EZ-Wave, however I assume I can't receive frames at all data rates, codings and modulations. Now I'm trying to implement solution according to their blocks to implement all of them.
Z-Wave procotol uses these modulations, data rates and coding:
9.6 kbps - FSK - Manchester
40 kbps - FSK - NRZ
100 kbps - GFSK - NRZ
These are my actual blocks (not able receving anything at all right now):
For example, I will explain my view on blocks for receiving at
9.6 kbps - FSK - Manchester
RTL-SDR Source
variable center_freq = 869500000
variable r1_freq_offset = 800e3
Ch0: Frequency: center_freq_3-r1_freq_offset, so I've got 868.7 Mhz on RTL-SDR Source block.
Frequency Xlating FIR Filter
Center frequency = - 800Khz to get frequency 868.95 Mhz (Europe). To be honest, I'm not sure why I do this and I need an explanation. I'm trying to implement those blocks according to EZ-Wave implementation of blocks for 40 kbps-FSK-NRZ (as I assume). They use sample rate 2M and different configurations, which I did not understand.
Taps = firdes.low_pass(1,samp_rate_1,samp_rate_1/2,5e3,firdes.WIN_HAMMING). I don't understand, what should be transition bw (5e3 in my case)
Sample rate = 19.2e3, because data rate/baud is 9.6 Kbps and according to Nyquist–Shannon sampling theorem, sampling rate should be at least double to data rate, so 2*9.6=19.2. So I'm trying to resample default 2M from source to 19.2 Kbps.
Simple squelch
I use default value (-40) and I'm not sure, if I should change this or not.
Quadrature Demod
should do the FSK demodulation and I use default value of gain. I'm not sure if this is a right way to do FSK demodulation.
Gain = 2(samp_rate_1)/(2*math.pi*20e3/8.0)*
Low Pass Filter
Sample rate = 19.2k to use the same new sample rate
Cuttoff Freq = 9.6k, I assume this according to https://nccgroup.github.io/RFTM/fsk_receiver.html
Transition width = 4.8 which is also sample_rate/2
Clock Recovery MM
Most of the parameters are default.
Omega = 2, because samp_rate/baud
Binary Slicer
is for getting binary code of signal
Zwave PacketSink 9.6
should the the Manchester decoding.
I would like to ask, what should I change on my blocks to achieve proper receiving of Z-Wave frames at all data rates, modulation and coding. When I start receiving, I'm able to see messages from my devices at FFT sink and Waterfall sink. The message debug doesn't print packets (like from original EZ-Wave solution) but only
Looking for sync : 575555aa
Looking for sync : 565555aa
Looking for sync : aa5555aa
what should be value in frame_shift_register, according to C code for Manchester decoding (ZWave PacketSink 9.6). I've seen similar post, however this is a bit different and to be honest, I'm stuck here.
I will be grateful for any help.
Let's look at the GFSK case. First of all, the sampling rate of the RTL source, 2M Baud is pretty high. For the maximum data rate, 100 kbps - GFSK, a sample rate of say 400 ~ 500kbaud will do just fine. There is also the power squelch block. This block prevents signals below a certain threshold to pass. This is not good because it filters low power signals that may contain information. There is also the sample rate issue between the lowpass filter and the MM clock recovery block. The output of the symbol recovery block should be 100kbaud (because for GFSK, sample rate = symbol rate). Using the omega value of 2 and working backward, the input to the MM block should be 200kbaud. But, the lowpass filter produces samples at 2Mbaud, 10 times than expected. You have to do proper decimation.
I implemented a GFSK receiver once for our CubeSat. Timing recovery was done by the PFB block, which is more reliable than the MM one. You can find the paper here:
https://www.researchgate.net/publication/309149646_Software-defined_radio_transceiver_for_QB50_CubeSat_telemetry_and_telecommand?_sg=HvZBpQBp8nIFh6mIqm4yksaAwTpx1V6QvJY0EfvyPMIz_IEXuLv2pODOnMToUAXMYDmInec76zviSg.ukZBHrLrmEbJlO6nZbF4X0eyhFjxFqVW2Q50cSbr0OHLt5vRUCTpaHi9CR7UBNMkwc_KJc1PO_TiGkdigaSXZA&_sgd%5Bnc%5D=1&_sgd%5Bncwor%5D=0
Some more details on the receiver could also be found here:
GFSK modulation/demodulation with GNU Radio and USRP
M.
I appreciate your answer, I've changed my sample rates. Now I'm still working on 9.6Kbps, FSK demodulation and Manchester decoding. Currently, output from my M&M clock recovery looks like this:
I would like to ask you what do think about this signal. As I said, it should be FSK demodulation and then I should use Manchester decoding. Do I still need usage of PCB block? Primary, I have to do 9.6kbps, FSK and Manchester, so I will look at 100Kbps GFSK NRZ if there will be some time left.
Sample rate is 1M because of RTL-SDR dongle limitations (225001 to 300000 and 900001 to 3200000).
Current blocks:
I don't understand :
Taps of Frequency Xlating FIR Filter firdes.low_pass(1,samp_rate_1,40e3,20e3,firdes.WIN_HAMMING)
Cuttoff Freq and Transition Width of Low Pass filter
Clock Recovery M&M aswell, so consider its values "random".
ClockRecovery Output:
I was trying to use PCB block according to your work at ResearchGate. However, I was unsuccessful because I still don't understand all that science behind the clock recovery.
Doing Low-pass filtering twice is because original Z-Wave blocks from scapy-radio for 40Kbps, FSK and NRZ coding are made like this (and it works):
So I thought I will be just about changing few parameters and decoder (Zwave PacketSink9.6).
I also uploaded my current blocks here.
Moses Browne Mwakyanjala, I'm also trying to implement that thing according to your work.
Maybe there is a problem with a clock recovery and Manchester decoding. Manchester decoding use transitions 0->1 and 1->0 to encode 0s and 1s. How can I properly configure clock recovery to achieve correct sample rate and transitions for Manchester decoding? Manchester decoder (Z-Wave PacketSink 9.6) is able to find the preamble and ends only with looking for sync.
I would like to also ask you, where can I find my modulation index "h" mentioned in your work?
Thank you

Time complexity of a genetic algorithm for bin packing

I am trying to explore genetic algorithms (GA) for the bin packing problem, and compare it to classical Any-Fit algorithms. However the time complexity for GA is never mentioned in any of the scholarly articles. Is this because the time complexity is very high? and that the main goal of a GA is to find the best solution without considering the time? What is the time complexity of a basic GA?
Assuming that termination condition is number of iterations and it's constant then in general it would look something like that:
O(p * Cp * O(Crossover) * Mp * O(Mutation) * O(Fitness))
p - population size
Cp - crossover probability
Mp - mutation probability
As you can see it not only depends on parameters like eg. population size but also on implementation of crossover, mutation operations and fitness function implementation. In practice there would be more parameters like for example chromosome size etc.
You don't see much about time complexity in publications because researchers most of the time compare GA using convergence time.
Edit Convergence Time
Every GA has some kind of a termination condition and usually it's convergence criteria. Let's assume that we want to find the minimum of a mathematical function so our convergence criteria will be the function's value. In short we reach convergence during optimization when it's no longer worth it to continue optimization because our best individual doesn't get better significantly. Take a look at this chart:
You can see that after around 10000 iterations fitness doesn't improve much and the line is getting flat. Best case scenario reaches convergence at around 9500 iterations, after that point we don't observe any improvement or it's insignificantly small. Assuming that each line shows different GA then Best case has the best convergence time becuase it reaches convergence criteria first.

How to decide the step size when using Metropolis–Hastings algorithm

I have a simple question regarding to Metropolis–Hastings algorithm.
Suppose the distribution only has one variable x and the value range of x is s=[-2^31,2^31].
In the sampling process, I need to propose a new value of x and then decide whether to accept it.
x_{t+1} =x_t+\epsilon
If I want to implement it by myself, how to decide the value of \epsilon.
The basic solution is to pick a value from Uniform[-2^31,2^31] and set it to \epsilon. What if the value range is unbounded like [-inf, inf]?
How does the current MCMC library (e.g. pymc) solve that problem?
Suppose you have $d$ dimensional parameters, the optimal scale is approximate $2.4d^(−1/2)$ times the scale of the target distribution, which implies optimal acceptance rates of 0.44 for $d = 1$ and 0.23 for $d$ goes to \infinity.
reference: Automatic Step Size Selection in Random Walk Metropolis Algorithms,
Todd L. Graves, 2011.
The best approach is to code a self-tuning algorithm that starts with an arbitrary variance for the step size variance, and tune this variance as the algorithm progresses. You are shooting for an acceptance rate of 25-50% for the Metropolis algorithm.

Standard Errors for Differential Evolution

Is it possible to calculate standard errors for Differential Evolution?
From the Wikipedia entry:
http://en.wikipedia.org/wiki/Differential_evolution
It's not derivative based (indeed that is one of its strengths) but how then so you calculate the standard errors?
I would have thought some kind of bootstrapping strategy might have been applicable but can't seem to find any sources than apply bootstrapping to DE?
Baz
Concerning the standard errors, differential evolution is just like any other evolutionary algorithm.
Using a bootstrapping strategy seems a good idea: the usual formulas assume a normal (Gaussian) distribution for the underlying data. That's almost never true for evolutionary computation (exponential distributions being far more common, probably followed by bimodal distributions).
The simplest bootstrap method involves taking the original data set of N numbers and sampling from it to form a new sample (a resample) that is also of size N. The resample is taken from the original using sampling with replacement. This process is repeated a large number of times (typically 1000 or 10000 times) and for each of these bootstrap samples we compute its mean / median (each of these are called bootstrap estimates).
The standard deviation (SD) of the means is the bootstrapped standard error (SE) of the mean and the SD of the medians is the bootstrapped SE of the median (the 2.5th and 97.5th centiles of the means are the bootstrapped 95% confidence limits for the mean).
Warnings:
the word population is used with different meanings in different contexts (bootstrapping vs evolutionary algorithm)
in any GA or GP, the average of the population tells you almost nothing of interest. Use the mean/median of the best-of-run
the average of a set that is not normally distributed produces a value that behaves non-intuitively. Especially if the probability distribution is skewed: large values in "tail" can dominate and average tends to reflect the typical value of the "worst" data not the typical value of the data in general. In this case it's better the median
Some interesting links are:
A short guide to using statistics in Evolutionary Computation
An Introduction to Statistics for EC Experimental Analysis