How Floyd Warshall algorithm works for N=500 within 1 second? - time-complexity

I am learning graph theory, and I found a problem Shortest Route II on CSES website.
https://cses.fi/problemset/task/1672/ is the problem.
I found the problem could be solved using Floyd Warshall Algorithm and the time complexity for the algorithm is O(n^3).
I am confused because here the constraints are N<=500, Time limit: 1s, so when N=500 the computations should be 500^3 = 125,000,000 and keeping in mind that in 1 second, 10^8 computations are performed, therefore it should at least take 1.25 seconds causing TLE, but the solution gets accepted. I want to learn what I am doing wrong in my calculations or understanding.

Related

Is it possible to create or access the gap history

In Julia JuMP, I am wondering whether it is possible to access an array, or similar, of the GAP history, the GAP improvements.
I know that at the end of the optimization, I can access the final gap with relative_gap(m) with m of type Model and I want the history of improvements over the solving process.
It could be for instance a vector of Float64 in percent: gaps=[20.2, 16.7, 13.8, 5.1, 0]. In such case my problem is solved to optimality as the final gap is 0.
At the end, I would like to plot the gap improvement in function of time of solving. So maybe all values of gaps could be a couple of two elements, the gap and the solving time until this new gap improvement?
Maybe this is not possible in JuMP so you could answer for Gurobi ?
Thanks!
The Gurobi callback solution has been answered by Oscar at another post - see Julia JUMP Gurobi MIP - query and store best objective and bound at runtime.
However, since Gurobi is very serious about its TimeLimit and has low times to continue computations there is also a simpler approach (this code assumes mo is your JuMP Gurobi model):
set_optimizer_attribute(mo, "TimeLimit", 2.0)
gap_history = Float64[]
max_steps = 1800
for t in 1:max_steps
optimize!(mo)
gap = MOI.get(mo, MOI.RelativeGap())
status=termination_status(mo)
push!(gap_history, gap)
gap > 0.01 || status == MOI.TIME_LIMIT || break
end
It's not possible using JuMP in general.
You could do this in Gurobi using a solver-specific callback: https://github.com/jump-dev/Gurobi.jl#callbacks
Or you could just parse the Gurobi log file that gets printed.
In general though, this information isn't all that useful as it can be quite noisy. Why do you want it?

SLSQP in ScipyOptimizeDriver only executes one iteration, takes a very long time, then exits

I'm trying to use SLSQP to optimise the angle of attack of an aerofoil to place the stagnation point in a desired location. This is purely as a test case to check that my method for calculating the partials for the stagnation position is valid.
When run with COBYLA, the optimisation converges to the correct alpha (6.04144912) after 47 iterations. When run with SLSQP, it completes one iteration, then hangs for a very long time (10, 20 minutes or more, I didn't time it exactly), and exits with an incorrect value. The output is:
Driver debug print for iter coord: rank0:ScipyOptimize_SLSQP|0
--------------------------------------------------------------
Design Vars
{'alpha': array([0.5])}
Nonlinear constraints
None
Linear constraints
None
Objectives
{'obj_cmp.obj': array([0.00023868])}
Driver debug print for iter coord: rank0:ScipyOptimize_SLSQP|1
--------------------------------------------------------------
Design Vars
{'alpha': array([0.5])}
Nonlinear constraints
None
Linear constraints
None
Objectives
{'obj_cmp.obj': array([0.00023868])}
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.0002386835700364719
Iterations: 1
Function evaluations: 1
Gradient evaluations: 1
Optimization Complete
-----------------------------------
Finished optimisation
Why might SLSQP be misbehaving like this? As far as I can tell, there are no incorrect analytical derivatives when I look at check_partials().
The code is quite long, so I put it on Pastebin here:
core: https://pastebin.com/fKJpnWHp
inviscid: https://pastebin.com/7Cmac5GF
aerofoil coordinates (NACA64-012): https://pastebin.com/UZHXEsr6
You asked two questions whos answers ended up being unrelated to eachother:
Why is the model so slow when you use SLSQP, but fast when you use COBYLA
Why does SLSQP stop after one iteration?
1) Why is SLSQP so slow?
COBYLA is a gradient free method. SLSQP uses gradients. So the solid bet was that slow down happened when SLSQP asked for the derivatives (which COBYLA never did).
Thats where I went to look first. Computing derivatives happens in two steps: a) compute partials for each component and b) solve a linear system with those partials to compute totals. The slow down has to be in one of those two steps.
Since you can run check_partials without too much trouble, step (a) is not likely to be the culprit. So that means step (b) is probably where we need to speed things up.
I ran the summary utility (openmdao summary core.py) on your model and saw this:
============== Problem Summary ============
Groups: 9
Components: 36
Max tree depth: 4
Design variables: 1 Total size: 1
Nonlinear Constraints: 0 Total size: 0
equality: 0 0
inequality: 0 0
Linear Constraints: 0 Total size: 0
equality: 0 0
inequality: 0 0
Objectives: 1 Total size: 1
Input variables: 87 Total size: 1661820
Output variables: 44 Total size: 1169614
Total connections: 87 Total transfer data size: 1661820
Then I generated an N2 of your model and saw this:
So we have an output vector that is 1169614 elements long, which means your linear system is a matrix that is about 1e6x1e6. Thats pretty big, and you are using a DirectSolver to try and compute/store a factorization of it. Thats the source of the slow down. Using DirectSolvers is great for smaller models (rule of thumb, is that the output vector should be less than 10000 elements). For larger ones you need to be more careful and use more advanced linear solvers.
In your case we can see from the N2 that there is no coupling anywhere in your model (nothing in the lower triangle of the N2). Purely feed-forward models like this can use a much simpler and faster LinearRunOnce solver (which is the default if you don't set anything else). So I turned off all DirectSolvers in your model, and the derivatives became effectively instant. Make your N2 look like this instead:
The choice of best linear solver is extremely model dependent. One factor to consider is computational cost, another is numerical robustness. This issue is covered in some detail in Section 5.3 of the OpenMDAO paper, and I won't cover everything here. But very briefly here is a summary of the key considerations.
When just starting out with OpenMDAO, using DirectSolver is both the simplest and usually the fastest option. It is simple because it does not require consideration of your model structure, and it's fast because for small models OpenMDAO can assemble the Jacobian into a dense or sparse matrix and provide that for direct factorization. However, for larger models (or models with very large vectors of outputs), the cost of computing the factorization is prohibitively high. In this case, you need to break the solver structure down more intentionally, and use other linear solvers (sometimes in conjunction with the direct solver--- see Section 5.3 of OpenMDAO paper, and this OpenMDAO doc).
You stated that you wanted to use the DirectSolver to take advantage of the sparse Jacobian storage. That was a good instinct, but the way OpenMDAO is structured this is not a problem either way. We are pretty far down in the weeds now, but since you asked I'll give a short summary explanation. As of OpenMDAO 3.7, only the DirectSolver requires an assembled Jacobian at all (and in fact, it is the linear solver itself that determines this for whatever system it is attached to). All other LinearSolvers work with a DictionaryJacobian (which stores each sub-jac keyed to the [of-var, wrt-var] pair). Each sub-jac can be stored as dense or sparse (depending on how you declared that particular partial derivative). The dictionary Jacobian is effectively a form of a sparse-matrix, though not a traditional one. The key takeaway here is that if you use the LinearRunOnce (or any other solver), then you are getting a memory efficient data storage regardless. It is only the DirectSolver that changes over to a more traditional assembly of an actual matrix object.
Regarding the issue of memory allocation. I borrowed this image from the openmdao docs
2) Why does SLSQP stop after one iteration?
Gradient based optimizations are very sensitive to scaling. I ploted your objective function inside your allowed design space and got this:
So we can see that the minimum is at about 6 degrees, but the objective values are TINY (about 1e-4).
As a general rule of thumb, getting your objective to around order of magnitude 1 is a good idea (we have a scaling report feature that helps with this). I added a reference that was about the order of magnitude of your objective:
p.model.add_objective('obj', ref=1e-4)
Then I got a good result:
Optimization terminated successfully (Exit mode 0)
Current function value: [3.02197589e-11]
Iterations: 7
Function evaluations: 9
Gradient evaluations: 7
Optimization Complete
-----------------------------------
Finished optimization
alpha = [6.04143334]
time: 2.1188600063323975 seconds
Unfortunately, scaling is just hard with gradient based optimization. Starting by scaling your objective/constraints to order-1 is a decent rule of thumb, but its common that you need to adjust things beyond that for more complex problems.

Time complexity of a genetic algorithm for bin packing

I am trying to explore genetic algorithms (GA) for the bin packing problem, and compare it to classical Any-Fit algorithms. However the time complexity for GA is never mentioned in any of the scholarly articles. Is this because the time complexity is very high? and that the main goal of a GA is to find the best solution without considering the time? What is the time complexity of a basic GA?
Assuming that termination condition is number of iterations and it's constant then in general it would look something like that:
O(p * Cp * O(Crossover) * Mp * O(Mutation) * O(Fitness))
p - population size
Cp - crossover probability
Mp - mutation probability
As you can see it not only depends on parameters like eg. population size but also on implementation of crossover, mutation operations and fitness function implementation. In practice there would be more parameters like for example chromosome size etc.
You don't see much about time complexity in publications because researchers most of the time compare GA using convergence time.
Edit Convergence Time
Every GA has some kind of a termination condition and usually it's convergence criteria. Let's assume that we want to find the minimum of a mathematical function so our convergence criteria will be the function's value. In short we reach convergence during optimization when it's no longer worth it to continue optimization because our best individual doesn't get better significantly. Take a look at this chart:
You can see that after around 10000 iterations fitness doesn't improve much and the line is getting flat. Best case scenario reaches convergence at around 9500 iterations, after that point we don't observe any improvement or it's insignificantly small. Assuming that each line shows different GA then Best case has the best convergence time becuase it reaches convergence criteria first.

Break if Newton's method is not convergent

I'm trying to implement Newton's method for polynomials to find zero of function. But I must predict case when function hasn't a root. I'm wondering how can I detect moment when the method becomes divergent and then stop procedure?
Thank you in advance for any help
Generally, if the root is not found after 10 iterations, then the initial point was bad. To be safe take 15 or 20 iterations. Or check after 5-10 iterations for quadratic convergence, measured by the function value decreasing from iteration to iteration faster than by a factor of 0.25.
Restart in a bad case with a different point.

LDPC behaviour as density of parity-check matrix increases

My assignment is to implement a Loopy Belief Propagation algorithm for Low-density Parity-check Code. This code uses a parity-check matrix H which is rather sparse (say 750-by-1000 binary matrix with an average of about 3 "ones" per each column). The code to generate the parity-check matrix is taken from here
Anyway, one of the subtasks is to check the reliability of LDPC code when the density of the matrix H increases. So, I fix the channel at 0.5 capacity, fix my code speed at 0.35 and begin to increase the density of the matrix. As the average number of "ones" in a column goes from 3 to 7 in steps of 1, disaster happens. With 3 or 4 the code copes perfectly well. With higher density it begins to fail: not only does it sometimes fail to converge, it oftentimes converges to the wrong codeword and produces mistakes.
So my question is: what type of behaviour is expected of an LDPC code as its sparse parity-check matrix becomes denser? Bonus question for skilled mind-readers: in my case (as the code performance degrades) is it more likely because the Loopy Belief Propagation algo has no guarantee on convergence or because I made a mistake implementing it?
After talking to my TA and other students I understand the following:
According to Shannon's theorem, the reliability of the code should increase with the density of the parity check matrix. That is simply because more checks are made.
However, since we use Loopy Belief Propagation, it struggles a lot when there are more and more edges in the graph forming more and more loops. Therefore, the actual performance degrades.
Whether or not I made a mistake in my code based solely on this behaviour cannot be established. However, since my code does work for sparse matrices, it is likely that the implementation is fine.