Is the result of a Global MINLP Solver, a global optimum - optimization

I used Pyomo and couenne solver that is a global solver for MINLP. But the result is not global optimum!!
Is it possible that the result of a global solver don't be a global optimum?
What is the problem?
The solver log is:
Solver command line: ['C:\\Users\\~\\couenne.exe', 'C:\\Users\\~\\AppData\\Local\\Temp\\tmpv6y0s3cc.pyomo.nl', '-AMPL']
couenne:
ANALYSIS TEST: Problem size before reformulation: 16 variables (6 integer), 37 constraints.
Problem size after reformulation: 54 variables (6 integer), 37 constraints.
*****************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Common Public License (CPL).
For more information visit http://projects.coin-or.org/Ipopt ******************************************************************************
NOTE: You are using Ipopt by default with the MUMPS linear solver.
Other linear solvers might be more efficient (see Ipopt documentation).
Cbc0031I 8 added rows had average density of 2
Cbc0013I At root node, 8 cuts changed objective from -20.3231 to -20.235 in 10 passes
Cbc0014I Cut generator 0 (Couenne convexifier cuts) - 48 row cuts average 2.0 elements, 2 column cuts (10 active)
Cbc0010I After 0 nodes, 0 on tree, 1e+050 best solution, best possible -20.235 (0.16 seconds)
Cbc0004I Integer solution of -10.0826 found after 831 iterations and 63 nodes (0.81 seconds)
Cbc0010I After 100 nodes, 38 on tree, -10.0826 best solution, best possible -15.9137 (0.89 seconds)
Cbc0004I Integer solution of -10.5219 found after 1264 iterations and 114 nodes (0.91 seconds)
Cbc0010I After 200 nodes, 26 on tree, -10.5219 best solution, best possible -11.899 (1.44 seconds)
Cbc0010I After 300 nodes, 29 on tree, -10.5219 best solution, best possible -10.9884 (1.61 seconds)
Cbc0010I After 400 nodes, 13 on tree, -10.5219 best solution, best possible -10.5235 (1.72 seconds)
Cbc0001I Search completed - best objective -10.52192111756512, took 3256 iterations and 479 nodes (1.80 seconds)
Cbc0035I Maximum depth 36, 0 variables fixed on reduced cost
Couenne convexifier cuts was tried 939 times and created 3190 cuts of which 548 were active after adding rounds of cuts
"Finished"

Related

Getting "DUAL_INFEASIBLE" when solving a very simple linear programming problem

I am solving a simple LP problem using Gurobi with dual simplex and presolve. I get the model is unbounded but I couldn't see why such a model is unbounded. Can anyone help to tell me where goes wrong?
I attached the log and also the content in the .mps file.
Thanks very much in advance.
Kind regards,
Hongyu.
The output log and .mps file:
Link to the .mps file: https://studntnu-my.sharepoint.com/:u:/g/personal/hongyuzh_ntnu_no/EV5CBhH2VshForCL-EtPvBUBiFT8uZZkv-DrPtjSFi8PGA?e=VHktwf
Gurobi Optimizer version 9.5.2 build v9.5.2rc0 (mac64[arm])
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 1 rows, 579 columns and 575 nonzeros
Coefficient statistics:
Matrix range [3e-02, 5e+01]
Objective range [7e-01, 5e+01]
Bounds range [0e+00, 0e+00]
RHS range [7e+03, 7e+03]
Iteration Objective Primal Inf. Dual Inf. Time
0 handle free variables 0s
Solved in 0 iterations and 0.00 seconds (0.00 work units)
Unbounded model
The easiest way to debug this is to put a bound on the objective, so the model is no longer unbounded. Then inspect the solution. This is a super easy trick that somehow few people know about.
When we do this with a bound of 100000, we see:
phi = 100000.0000
gamma[11] = -1887.4290
(the rest zero). Indeed we can make gamma[11] as negative as we want to obey R0. Note that gamma[11] is not in the objective.
More advice: It is also useful to write out the LP file of the model and study that carefully. You probably would have caught the error and that would have prevented this post.

SLSQP in ScipyOptimizeDriver only executes one iteration, takes a very long time, then exits

I'm trying to use SLSQP to optimise the angle of attack of an aerofoil to place the stagnation point in a desired location. This is purely as a test case to check that my method for calculating the partials for the stagnation position is valid.
When run with COBYLA, the optimisation converges to the correct alpha (6.04144912) after 47 iterations. When run with SLSQP, it completes one iteration, then hangs for a very long time (10, 20 minutes or more, I didn't time it exactly), and exits with an incorrect value. The output is:
Driver debug print for iter coord: rank0:ScipyOptimize_SLSQP|0
--------------------------------------------------------------
Design Vars
{'alpha': array([0.5])}
Nonlinear constraints
None
Linear constraints
None
Objectives
{'obj_cmp.obj': array([0.00023868])}
Driver debug print for iter coord: rank0:ScipyOptimize_SLSQP|1
--------------------------------------------------------------
Design Vars
{'alpha': array([0.5])}
Nonlinear constraints
None
Linear constraints
None
Objectives
{'obj_cmp.obj': array([0.00023868])}
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.0002386835700364719
Iterations: 1
Function evaluations: 1
Gradient evaluations: 1
Optimization Complete
-----------------------------------
Finished optimisation
Why might SLSQP be misbehaving like this? As far as I can tell, there are no incorrect analytical derivatives when I look at check_partials().
The code is quite long, so I put it on Pastebin here:
core: https://pastebin.com/fKJpnWHp
inviscid: https://pastebin.com/7Cmac5GF
aerofoil coordinates (NACA64-012): https://pastebin.com/UZHXEsr6
You asked two questions whos answers ended up being unrelated to eachother:
Why is the model so slow when you use SLSQP, but fast when you use COBYLA
Why does SLSQP stop after one iteration?
1) Why is SLSQP so slow?
COBYLA is a gradient free method. SLSQP uses gradients. So the solid bet was that slow down happened when SLSQP asked for the derivatives (which COBYLA never did).
Thats where I went to look first. Computing derivatives happens in two steps: a) compute partials for each component and b) solve a linear system with those partials to compute totals. The slow down has to be in one of those two steps.
Since you can run check_partials without too much trouble, step (a) is not likely to be the culprit. So that means step (b) is probably where we need to speed things up.
I ran the summary utility (openmdao summary core.py) on your model and saw this:
============== Problem Summary ============
Groups: 9
Components: 36
Max tree depth: 4
Design variables: 1 Total size: 1
Nonlinear Constraints: 0 Total size: 0
equality: 0 0
inequality: 0 0
Linear Constraints: 0 Total size: 0
equality: 0 0
inequality: 0 0
Objectives: 1 Total size: 1
Input variables: 87 Total size: 1661820
Output variables: 44 Total size: 1169614
Total connections: 87 Total transfer data size: 1661820
Then I generated an N2 of your model and saw this:
So we have an output vector that is 1169614 elements long, which means your linear system is a matrix that is about 1e6x1e6. Thats pretty big, and you are using a DirectSolver to try and compute/store a factorization of it. Thats the source of the slow down. Using DirectSolvers is great for smaller models (rule of thumb, is that the output vector should be less than 10000 elements). For larger ones you need to be more careful and use more advanced linear solvers.
In your case we can see from the N2 that there is no coupling anywhere in your model (nothing in the lower triangle of the N2). Purely feed-forward models like this can use a much simpler and faster LinearRunOnce solver (which is the default if you don't set anything else). So I turned off all DirectSolvers in your model, and the derivatives became effectively instant. Make your N2 look like this instead:
The choice of best linear solver is extremely model dependent. One factor to consider is computational cost, another is numerical robustness. This issue is covered in some detail in Section 5.3 of the OpenMDAO paper, and I won't cover everything here. But very briefly here is a summary of the key considerations.
When just starting out with OpenMDAO, using DirectSolver is both the simplest and usually the fastest option. It is simple because it does not require consideration of your model structure, and it's fast because for small models OpenMDAO can assemble the Jacobian into a dense or sparse matrix and provide that for direct factorization. However, for larger models (or models with very large vectors of outputs), the cost of computing the factorization is prohibitively high. In this case, you need to break the solver structure down more intentionally, and use other linear solvers (sometimes in conjunction with the direct solver--- see Section 5.3 of OpenMDAO paper, and this OpenMDAO doc).
You stated that you wanted to use the DirectSolver to take advantage of the sparse Jacobian storage. That was a good instinct, but the way OpenMDAO is structured this is not a problem either way. We are pretty far down in the weeds now, but since you asked I'll give a short summary explanation. As of OpenMDAO 3.7, only the DirectSolver requires an assembled Jacobian at all (and in fact, it is the linear solver itself that determines this for whatever system it is attached to). All other LinearSolvers work with a DictionaryJacobian (which stores each sub-jac keyed to the [of-var, wrt-var] pair). Each sub-jac can be stored as dense or sparse (depending on how you declared that particular partial derivative). The dictionary Jacobian is effectively a form of a sparse-matrix, though not a traditional one. The key takeaway here is that if you use the LinearRunOnce (or any other solver), then you are getting a memory efficient data storage regardless. It is only the DirectSolver that changes over to a more traditional assembly of an actual matrix object.
Regarding the issue of memory allocation. I borrowed this image from the openmdao docs
2) Why does SLSQP stop after one iteration?
Gradient based optimizations are very sensitive to scaling. I ploted your objective function inside your allowed design space and got this:
So we can see that the minimum is at about 6 degrees, but the objective values are TINY (about 1e-4).
As a general rule of thumb, getting your objective to around order of magnitude 1 is a good idea (we have a scaling report feature that helps with this). I added a reference that was about the order of magnitude of your objective:
p.model.add_objective('obj', ref=1e-4)
Then I got a good result:
Optimization terminated successfully (Exit mode 0)
Current function value: [3.02197589e-11]
Iterations: 7
Function evaluations: 9
Gradient evaluations: 7
Optimization Complete
-----------------------------------
Finished optimization
alpha = [6.04143334]
time: 2.1188600063323975 seconds
Unfortunately, scaling is just hard with gradient based optimization. Starting by scaling your objective/constraints to order-1 is a decent rule of thumb, but its common that you need to adjust things beyond that for more complex problems.

AMPL IPOPT gives wrong optimal solution while solve result is "solved"

I am trying to solve a very simple optimization problem in AMPL with IPOPT as follow:
var x1 >= 0 ;
minimize obj: -(x1^2)+x1;
obviously the problem is unbounded. but IPOPT gives me:
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************
This is Ipopt version 3.12.4, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).
Number of nonzeros in equality constraint Jacobian...: 0
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 1
Total number of variables............................: 1
variables with only lower bounds: 1
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 0
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 9.8999902e-03 0.00e+00 2.00e-02 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0
1 1.5346023e-04 0.00e+00 1.50e-09 -3.8 9.85e-03 - 1.00e+00 1.00e+00f 1
2 1.7888952e-06 0.00e+00 1.84e-11 -5.7 1.52e-04 - 1.00e+00 1.00e+00f 1
3 -7.5005506e-09 0.00e+00 2.51e-14 -8.6 1.80e-06 - 1.00e+00 1.00e+00f 1
Number of Iterations....: 3
(scaled) (unscaled)
Objective...............: -7.5005505996934397e-09 -7.5005505996934397e-09
Dual infeasibility......: 2.5091040356528538e-14 2.5091040356528538e-14
Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity.........: 2.4994494940593761e-09 2.4994494940593761e-09
Overall NLP error.......: 2.4994494940593761e-09 2.4994494940593761e-09
Number of objective function evaluations = 4
Number of objective gradient evaluations = 4
Number of equality constraint evaluations = 0
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 3
Total CPU secs in IPOPT (w/o function evaluations) = 0.001
Total CPU secs in NLP function evaluations = 0.000
EXIT: Optimal Solution Found.
Ipopt 3.12.4: Optimal Solution Found
suffix ipopt_zU_out OUT;
suffix ipopt_zL_out OUT;
ampl: display x1;
x1 = 0
when I change the solver to Gurobi, it gives this message:
Gurobi 6.5.0: unbounded; variable.unbdd returned.
which is what I expected.
I can not understand why it happens and now I don't know if I need to check it for all the problem that I am trying to solve to not converging to the the wrong optimal solution. As it is a super simple example it is a little bit strange.
I would appreciate if anybody can help me with this.
Thanks
You've already identified the basic problem, but elaborating a little on why these two solvers give different results:
IPOPT is designed to cover a wide range of optimisation problems, so it uses some fairly general numeric optimisation methods. I'm not familiar with the details of IPOPT but usually this sort of approach relies on picking a starting point, looking at the curvature of the objective function in the neighbourhood of that starting point, and following the curvature "downhill" until they find a local optimum. Different starting points can lead to different results. In this case IPOPT is probably defaulting to zero for the starting point, so it's right on top of that local minimum. As Erwin's suggested, if you specify a different starting point it might find the unboundedness.
Gurobi is designed specifically for quadratic/linear problems, so it uses very different methods which aren't susceptible to local-minimum issues, and it will probably be much more efficient for quadratics. But it doesn't support more general objective functions.
I think I understand why it happened. the objective function
-(x1^2)+x1;
is not convex. therefore the given solution is local optimum.

Genetic algorithm for optimization in game playing agent heuristic evaluation function

This is in response to an answer given in this question:
How to create a good evaluation function for a game?, particularly by #David (it is the first answer).
Background: I am using a genetic algorithm to optimize the hyper parameters in a game playing agent that is using minimax / alpha beta pruning (with iterative deepening). In particular, I would like to optimize the heuristic (evaluation) function parameters using a genetic algorithm. The evaluation function I use is:
f(w) = w * num_my_moves - (1-w) * num_opponent_moves
The only parameter to optimize is w in [0,1].
Here's how I programmed the genetic algorithm:
Create a random population of say 100 agents
Let them play 1000 games at random with replacement.
Let the parents be the top performing agents with some poorer performing agents mixed in for genetic diversity.
Randomly breed some parents to create children. * Breeding process: We define a child to be an average of the weights of its parents.
i.e. childWeight = 0.5(father.w+ mother.w)
The new population is formed by the parents and the newly created children.
Randomly mutate 1% of the population as follows: newWeight = agent.x + random.uniform(-0.01,0.01) and account for trivial border cases (i.e. less than zero and greater than one, appropriately).
Evolve 10 times (i.e. repeat for new population)
My question: Please evaluate the bold points above. In particular, does anyone have a better way to breed (rather than trivially averaging the parent weights), and does anyone have a better way to mutate, rather than just adding random.uniform(-0.01,0.01)?
It looks like you're not actually applying a genetic-algorithm to your agents, but rather just simple evolution directly on the phenotype/weights. I suggest you try introducing a genetic representation of your weights, and evolve this genome instead. An example would be to represent your weights as a binary string, and apply evolution on each bit of the string, meaning there is a likelihood that each bit gets mutated. This is called point mutations. There are many other mutations you can apply, but it would do as a start.
What you will notice is that your agents don't get stuck in local minima as much because sometimes a small genetic change can vastly change the phenotype/weights.
Ok, that might sound complicated, it's not really. Let me give you an example:
Say you have a weight of 42 in base 10. This would be 101010 in binary. Now you have implemented a 1% mutation rate on each bit of the binary representation. Let's say the last bit is flipped. Then we have 101011 in binary, or 43 in decimal. Not such a big change. Doing the same with the second bit on the other hand gives you 111010 in binary or 58 decimal. Notice the big jump. This is what we want, and lets your agent population search a larger part of the solution space faster.
With regard to breeding. You can try crossover. Lets assume you have many weights each with a genetic encoding. If you represent the whole genome (all the binary data) as one long binary string you can combine sections of the two parents genome. Example, again. The following is the "father" and "mother" genome and phenotype:
Weight Name: W1 W2 W3 W4 W5
Father Phenotype: 43 15 34 17 14
Father Genome: 101011 001111 100010 010001 001110
Mother Genome: 100110 100111 011001 010100 101000
Mother Phenotype: 38 39 25 20 40
What you can do is draw arbitrary lines through both genomes at the same place, and assign the segments arbitrarily to the child. This is a version of crossover.
Weight Name: W1 W2 W3 W4 W5
Father Genome: 101011 00.... ...... .....1 001110
Mother Genome: ...... ..0111 011001 01010. ......
Child Genome: 101011 000111 011001 010101 001110
Child Phenotype: 43 7 25 21 14
Here the first 8 and the last 7 bits come from the father, and the middle comes from the mother. Notice how weight W1 and W5 are entirely from the father, and W3 is entirely from the mother. While W2 and W4 are combinations. W4 had hardly any change, while W2 has changed drastically.
I hope this gives you some insight in how to do genetic algorithms. That said, I recommend using a modern library instead of implementing it yourself, unless you are doing it to learn.
Edit: More on handling the weights/binary representation:
If you need fractions, you can do this by separating the numerator and denominator as different weights, or have one of them as a constant, e.g., 42 and 10 gives 4.2.)
Larger than 0 constraints come free. To actually get negative numbers you need to negate your weights.
Less than 1 constraint you can get by dividing the weight by the maximum possible value for that bit string length. In the examples above you have 6 bits, which can become a maximum of 63. If you then after mutation get a binary string of 101010 or 42 in base 10, you do 42/63 getting 0.667 and can only ever get as high as 1.0, when 63/63.
Two weights' sum equal to 1? If you get 101010 and 001000 for W1 and W2, it gives 42 and 8, then you can go W1_scaled = W1 / (W1 + W2) = 0.84 and W2_scaled = W2 / (W1 + W2) = 0.16. This should give you W1_scaled + W2_scaled = 1 always.
Since I was mentioned.
Rather than averaging the parent weights, I picked random numbers using the parent weights as a min/max. I additionally found I had to widen the range slightly (compensating for the reduction in standard deviation when I'd average two uniform random numbers, or sqrt(2), but I probably wasn't exact) to resist the pull toward the average. Otherwise the population converges toward the average and can't escape.
So if the parents' weights were 0.1 and 0.2, it might pick a random number between 0.08 and 0.22 for the child weight.
Late edit: A more accepted, studied, understood approach that I didn't know at the time is something called "Differential Evolution".

Simple voice recognition when whispering

I'm trying to do simple voice to text mapping using pocketsphinx (. The grammar is very simple such as:
public <grammar> = (Matt, Anna, Tom, Christine)+ (One | Two | Three | Four | Five | Six | Seven | Eight | Nine | Zero)+ ;
e.g:
Tom Anna Three Three
yields
Tom Anna 33
I adapted the acoustic model (to take into account my foreign accent) and after that I received decent performance (~94% accuracy). I used training dataset of ~3minutes.
Right now I'm trying to do the same but by whispering to the microphone. The accuracy dropped significantly to ~50% w/o training. With training for accent
I got ~60%. I tried other thinks including denoising and boosting volume. I read the whole docs but was wondering if anyone could answer some questions so I can
better know in which direction should I got to improve performance.
1) in tutorial you are adapting hub4wsj_sc_8k acustic model. I guess "8k" is a sampling parameter. When using sphinx_fe you use "-samprate 16000". Was it used deliberately to train 8k model using data with 16k sampling rate? Why data with 8k sampling haven't been used? Does it have influence on performance?
2) in sphinx 4.1 (in comparison to pocketsphinx) there are differenct acoustic models e.g. WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar. Can those models be used with pocketsphinx? Will acustic model with 16k sampling have typically better performance with data having 16k sampling rate?
3) when using data for training should I use those with normal speaking mode (to adapt only for my accent) or with whispering mode (to adapt to whisper and my accent)? I think I tried both scenarios and didn't notice any difference to draw any conclussion but I don't know pocketsphinx internals so I might be doing something wrong.
4) I used the following script to record adapting training and testing data from the tutorial:
for i in `seq 1 20`; do
fn=`printf arctic_%04d $i`;
read sent; echo $sent;
rec -r 16000 -e signed-integer -b 16 -c 1 $fn.wav 2>/dev/null;
done < arctic20.txt
I noticed that each time I hit Control-C this keypress is distinct in the recorded audio that leaded to errors. Trimming audio somtimes helped to correct to or lead to
other error instead. Is there any requirement that each recording has some few seconds of quite before and after speaking?
5) When accumulating observation counts is there any settings I can tinker with to improve performance?
6) What's the difference between semi-continuous and continuous model? Can pocketsphinx use continuous model?
7) I noticed that 'mixture_weights' file from sphinx4 is much smaller comparing to the one you got in pocketsphinx-extra. Does it make any difference?
8) I tried different combination of removing white noise (using 'sox' toolkit e.g. sox noisy.wav filtered.wav noisered profile.nfo 0.1). Depending on the last parameter
sometimes it improved a little bit (~3%) and sometimes it makes worse. Is it good to remove noise or it's something pocketsphinx doing as well? My environment is quite
is there is only white noise that I guess can have more inpack when audio recorded whispering.
9) I noticed that boosting volume (gain) alone most of the time only maked the performance a little bit worse even though for humans it was easier to distinguish words. Should I avoid it?
10) Overall I tried different combination and the best results I got is ~65% when only removing noise, so only slight (5%) improvement. Below are some stats:
//ORIGNAL UNPROCESSED TESTING FILES
TOTAL Words: 111 Correct: 72 Errors: 43
TOTAL Percent correct = 64.86% Error = 38.74% Accuracy = 61.26%
TOTAL Insertions: 4 Deletions: 13 Substitutions: 26
//DENOISED + VOLUME UP
TOTAL Words: 111 Correct: 76 Errors: 42
TOTAL Percent correct = 68.47% Error = 37.84% Accuracy = 62.16%
TOTAL Insertions: 7 Deletions: 4 Substitutions: 31
//VOLUME UP
TOTAL Words: 111 Correct: 69 Errors: 47
TOTAL Percent correct = 62.16% Error = 42.34% Accuracy = 57.66%
TOTAL Insertions: 5 Deletions: 12 Substitutions: 30
//DENOISE, threshold 0.1
TOTAL Words: 111 Correct: 77 Errors: 41
TOTAL Percent correct = 69.37% Error = 36.94% Accuracy = 63.06%
TOTAL Insertions: 7 Deletions: 3 Substitutions: 31
//DENOISE, threshold 0.21
TOTAL Words: 111 Correct: 80 Errors: 38
TOTAL Percent correct = 72.07% Error = 34.23% Accuracy = 65.77%
TOTAL Insertions: 7 Deletions: 3 Substitutions: 28
Those processing I was doing only for testing data. Should the training data be processed in the same way? I think I tried that but there was barely any difference.
11) In all those testing I used ARPA language model. When using JGSF results where usually much worse (I have the latest pocketsphinx branch). Why is that?
12) Because is each sentence the maximum number would be '999' and no more than 3 names, I modified the JSGF and replaced repetition sign '+' by repeating content in the parentheses manually. This time the result where much closer to ARPA. Is there any way in grammar to tell maximum number of repetition like in regular expression?
13) When using ARPA model I generated it by using all possible combinations (since dictionary is fixed and really small: ~15 words) but then testing I was still receiving somtimes illegal results e.g. Tom Anna (without any required number). Is there any way to enforce some structure using ARPA model?
14) Should the dictionary be limited only to those ~15 words or just full dictionary will only affect speed but not performance?
15) Is modifying dictionary (phonemes) the way to go to improve recognition when whispering? (I'm not an expert but when we whisper I guess some words might sounds different?)
16) Any other tips how to improve accuracy would be really helpful!
Regarding whispering: when you do so, the sound waves don't have meaningful aperiodic parts (vibrations that result from your vocal cords resonating normally, but not when whispering). You can try this by putting your finger to your throat while loudly speaking 'aaaaaa', and then just whispering it.
AFAIR acoustic modeling relies a lot on taking the frequency spectrum of the sound to detect peaks (formants) and relate them to phones (like vowels).
Educated guess: when whispering, the spectrum is mostly white-noise, slightly shaped by the oral position (tongue, openness of mouth, etc), which is enough for humans, but far not enough to make the peeks distinguishable by a computer.