Downhill Simplex on finite intervals - optimization

I've been reading up on Downhill Simplex (Nelder-Mead) optimization, but what I was missing were good proposals on what to do when the parameters / coordinates are bound to a fixed interval. What is the best way to handle the case that one parameter goes to the limit of the intervals, especially avoiding that it gets "stuck" there?
Let's say I optimize a function of 10-20 parameters which are each limited to a finite interval, say, [0 ; 100]. What is the right course of action if the algorithm would push one or several of the parameters over the limits (<0 or >100)?
Thanks,
Martin
Reading through multiple descriptions and publications on numerical optimization using downhill simplex

Related

When calculating the time complexity of an algorithm can we count the addition of two numbers of any size as requiring 1 "unit" of time or O(1) units?

I am working on analysing the time complexity of an algorithm. I am not certain what the correct way of calculating with the time complexity of basic operations such as addition and subtraction of two numbers is. I have learnt that the time complexity of adding up two n digit numbers is O(n), because this is how many elementary bit operations you need to perform during the addition. However, I have heard recently, that nowadays, in modern processors the time taken by adding up two numbers of any size (which is still managable by a computer) is constant: it does not depend on the size of the two numbers. Hence in the time complexity analysis of an algorithm you should calculate the time complexity of adding up two numbers of any size as O(1). Which approach is correct? Or in case the answer is that both approaches are "correct" used in the appropriate context which approach is more acceptable in a research paper? Thank you for any answer in advance.
It depends on the kind of algorithm you are analyzing, but for the general case you are just going to assume the inputs to the algorithm being analyzed will fit into the word-size of the machine it will be performed on (be that 32 bit, 128 bit, whatever), and under that assumption, where any single arithmetic operation will probably be executed as a single machine instruction and be computed in a single or small constant number of CPU clock cycles regardless of the underlying complexity of the hardware implementation, you will treat the complexity of that operation as being O(1). That is, you would assume O(1) complexity for arithmetic operations unless there's a particular reason to assume that they cannot be handled in constant time.
You would only really break the O(1) assumption if you were specifically designing an algorithm to be performed on numerical inputs of arbitrary precision such that you're planning on actually programmatically computing arithmetic operations yourself rather than passing them off completely to hardware (your algorithm expects overflow/underflows and is designed to handle them), or if you were working down at the level of implementing these operations yourself in an ALU or FPU circuit. Then, whether multiplication is performed in O(n*logn) or O(n*logn*loglogn) time in the number of bits would actually become relevant to your complexity analysis: because the number of bits involved in these operations isn't bounded by some constant or you're specifically analyzing the complexity of an algorithm/piece of hardware which is itself implementing an arithmetic operation.

What is the relationship between time complexity and the number of steps in an algorithm?

For large values of n, an algorithm that takes 20000n^2 steps has better time complexity (takes less time) than one that takes 0.001n^5 steps
I believe this statement is true. But, why?
If there are more steps wouldn't that take more time?
Computational complexity is considered in the asymptotic sense because the important question is usually of scaling. Even with your clear case, the ^5 algorithm begins to take longer around 275 items - which isn't very many. See this figure from wolfram alpha:
Quoting from the wikipedia article linked above:
Usually asymptotic estimates are used because different implementations of the same algorithm may differ in efficiency. However the efficiencies of any two "reasonable" implementations of a given algorithm are related by a constant multiplicative factor called a hidden constant.
All that said, if you have two comparable algorithms and the one with less complexity has a significant constant coefficient and you're only going to process 10 items, then it very well may be a good idea to choose the less efficient one. Some common libraries even switch algorithms depending upon the size of the data being processed; this is called a hybrid algorithm and Python's sorted implementation, Timsort uses it to switch between insertion sort and merge sort.

When to switch from Dynamic Programming (2D table) to Branch & Bound algorithm?

I'm doing a knapsack optimization problem involving dynamic programming and branch & bound. I noticed that when the capacity and the item of the problem gets large, filling up the 2D table for the dynamic programming algorithm will get exponentially slower. At some point I am going to assume that I am suppose to switch algorithm depending on the size of the problem (since lecture have give two types of optimization)?
I've tried to google at what point (what size) should I switch from dynamic programming to branch & bound, but I couldn't get the result that I wanted.
Or, is there another way of looking at the knapsack problem in which I can combine dynamic programming and branch & bound as one algorithm instead of switching algorithm depending of the size of the problem?
Thanks.
Often when you have several algorithms that solve a problem but whose runtimes have different characteristics, you determine (empirically, not theoretically) when one algorithm becomes faster than the other. This is highly implementation- and machine-dependent. So measure both the DP algorithm and the B&B algorithm and figure out which one is better when.
A couple of hints:
You know that DP's runtime is proportional to the number of objects times the size of the knapsack.
You know that B&B's runtime can be as bad as 2# objects, but it's typically much better. Try to figure out the worst case.
Caches and stuff are going to matter for DP, so you'll want to estimate its runtime piecewise. Figure out where the breakpoints are.
DP takes up exorbitant amounts of memory. If it's going to take up too much, you don't really have a choice between DP and B&B.

Is flop per second a measure of the speed of a processor, or a measure of the speed of an algorithm?

1) I can see very clearly that: the number of floating point operations a computer can do in one second is a good way of quantifying its performance. That's correct, right?
2) My teacher keeps asking me to calculate the flop rate for algorithms I program. I do this by calculating how many flops the algorithm does and timing how long it takes to run. In this situation the flop rate always falls way short of the flop rate I expect from the computer I'm using. So for algorithms, is a flop rate more an assessment of how long the 'other stuff' takes (i.e. overheads, stuff that doesn't involve flopping). That is, when the flop count is low, most of the programs time is spent calling functions etc. and not performing flop, correct?
I know this is a very broad question but I was hoping for some ideas from those in industry or academia about what they intuitively feel the flop rate of an algorithm actually is.
Properly, “flops” is a measure of processor or system performance. Many people misuse it as a measure of implementation or algorithm speed.
Suppose you had a computation to perform that is fixed in the number of operations it takes. For example, you want to multiply a matrix with dimensions a•b with a matrix with dimensions b•c. If you perform this multiplication in the usual way, then, in each combination of one of a rows and one of c columns, you perform b multiplications and b-1 additions. So the entire matrix multiplication takes a•c•(2b-1) floating-point operations. If it finishes in one second, some people say it is providing a•c•(2b-1) flops.
If you have two programs that both do the multiplication the same way, you can compare them using this figure. The one of them that has more “flops” is better. Even though they use the same algorithm, one of them might have a better implementation, perhaps because it organizes the work more efficiently for memory cache.
This breaks when somebody figures out a new algorithm that gets the same job done with fewer operations. Then some people compare programs (or routines) using the nominal number of operations of the original method, even though the program actually performs fewer operations.
To some extent, this makes sense. If you have two programs that do the same job, and one of them has a higher number of “flops” calculated this way, then it is the program that gives you the answer more quickly.
However, it does not make sense to the extent that it introduces inaccuracy. We are often not interested in a single problem size but in various sizes, and the “flops” of a program will not scale linearly with the nominal number of operations once a new algorithm is used.
By analogy, suppose it is 80 kilometers from town A to town B over the mountain road that everybody uses. If it takes your car an hour to make the trip, your car is traveling 80 kilometers an hour. While out exploring one day, you discover a pass through the mountains that reduces the trip to 70 kilometers. Now you can make the trip in 52.5 minutes. The same calculation that some people do with “flops” would say your car is going 91.4 kilometers per hour, since it makes the 80-kilometer trip in 52.5 minutes.
That is obviously wrong. However, it is useful for deciding which route to take.
FLOPS means the amount of Floating Point Operations Per Second, executed by a processor. That can be a purely theoretical figure derived from some hardware/architecture specification or an empirical result from running some algorithm that is tuned to give high numbers.
The main issue in FLOPS calculation comes from a system, where there are multiple and parallel execution blocks. AFAIK, only in that context it starts to get really tough to split a practical algorithm (e.g. FFT, or RGB->YUV conversion) to the most useful set of instructions, that use all the calculation units in a CPU. (e.g. without automatic vectorization a x64 system often calculates Floating point operations only in the Xmm0[0] register, wasting 50-75% of the full potential.)
This partly answers the question 2. Besides of the obvious stall introduced by cache/memory to register bandwidth, the next crucial obstacle in the way to maximum FLOPS figures is that the data is in the wrong register. That's something that is often completely ignored in complexity analysis that just like FLOPS calculations only count basic arithmetic operations. In case of parallel programming, it often happens, that there are not only one, but 4, 8 or 16 values in wrong registers without any means of easily permuting them all at once. Add that to the overhead, "warm up" and "cool down" stages in an algorithm that tries to occupy all the calculating units with meaningful data and there you have major reasons for getting 100 MFlops out of a 1GFLOPS system.

Determining hopeless branches early in branch-and-bound algorithms

I have to design a branch-and bound algorithm that solves the optimal tour of a graph on the cartesian plane every time. I have been given the hint that identifying hopeless branches earlier in the runtime will compound into a program that runs "a hundred times faster". I had the idea of assuming that the shortest edge connected to the starting/ending node will be either the first or last edge in the tour but a thin diamond shaped graph proves otherwise. Does any one have ideas for how to eliminate these hopeless branches or a reference that talks about this?
Basically, is there a better way to branch to subsets of solutions better than just lexicographically, eg. first branch is including and excluding edge a-b, second branch includes and excludes branch a-c
So somewhere in your branch-and-bound algorithm, you look at possible places to go, and then somehow keep track of them to do later.
To make this more efficient, you can do a couple things:
Write a better bound calculator. In other words, come up with an algorithm that determines the bound more accurately. This will result in less time spent on paths that turn out to be poor.
Instead of using a stack to keep track of things to do, use a queue. Instead of using a queue, use a priority queue (heap) ordered by bound, e.g. the things that seem best are put at the top of the heap, and the things that seem bad are put on the bottom.
Nearest-neighbor is a simple algorithm. Branch-and-Bound is just an optimizing loop and additionally you need a sub-problem solver. I think nearest-neighbor is also a branch-and-bound algorithm. Instead I would look into the simplex algorithm. It's a linear programming algorithm. Also cutting-plane algorithm to solve tsp.