Equality and inequality constraints in multi-objective optimisation? - optimization

This question has been posted in stach mathematics link and I would like to post it here as well to get an answer
The general form of the multi-objective optimisation as the following:
Maximise/ Minimise f(x), m=1,2,… ,M;
subject to j (x)≥0, j=1,2,… ,J;
k (x)=0, k=1,2,… ,K;
x_i^((L))≤x_i≤x_i^((U)), i=1,2,… ,N;
where, f(x): R^N→R^M,x=(x_1,x_2,...,x_K,...,x_N) is the vector of the N parameters, M is the number of objective functions, k and j are the equality and inequality constraints, respectively, with K and J are the number of equality and inequality constraints that the solution must satisfy, respectively. The last set of constraints are the parameter bounds restricting each parameter x_i to take a value within an upper bound x_i^((U)) and a lower bound x_i^((L)).
What does the equality and inequality constraints? and what do they do? and how can i know the K and J?
I appreciate all the feedback

An optimization problem is a way of modeling a system. Variables, objectives and constraints all come out of that model. Consider a constraint on how much of a resource x_i is available. Suppose you have 5 of whatever x_i is. Then an inequality constraint j is -x_i + 5 >= 0 . Equality constraints K come from similar considerations. Suppose you must assign exactly three of x_i. Then you have an equality constraint k: x_i = 3.

Related

Relaxation of linear constraints?

When we need to optimize a function on the positive real half-line, and we only have non-constraints optimization routines, we use y = exp(x), or y = x^2 to map to the real line and still optimize on the log or the (signed) square root of the variable.
Can we do something similar for linear constraints, of the form Ax = b where, for x a d-dimensional vector, A is a (N,n)-shaped matrix and b is a vector of length N, defining the constraints ?
While, as Ervin Kalvelaglan says this is not always a good idea, here is one way to do it.
Suppose we take the SVD of A, getting
A = U*S*V'
where if A is n x m
U is nxn orthogonal,
S is nxm, zero off the main diagonal,
V is mxm orthogonal
Computing the SVD is not a trivial computation.
We first zero out the elements of S which we think are non-zero just due to noise -- which can be a slightly delicate thing to do.
Then we can find one solution x~ to
A*x = b
as
x~ = V*pinv(S)*U'*b
(where pinv(S) is the pseudo inverse of S, ie replace the non zero elements of the diagonal by their multiplicative inverses)
Note that x~ is a least squares solution to the constraints, so we need to check that it is close enough to being a real solution, ie that Ax~ is close enough to b -- another somewhat delicate thing. If x~ doesn't satisfy the constraints closely enough you should give up: if the constraints have no solution neither does the optimisation.
Any other solution to the constraints can be written
x = x~ + sum c[i]*V[i]
where the V[i] are the columns of V corresponding to entries of S that are (now) zero. Here the c[i] are arbitrary constants. So we can change variables to using the c[] in the optimisation, and the constraints will be automatically satisfied. However this change of variables could be somewhat irksome!

Comparing complexity of O(n+m) and O(max(n,m))

I had a job interview today. And was asked about complexity of std:set_intersection. When I was answering I mentioned that
O(n+m)
is equal to:
O(max(n,m))
I was told that this is incorrect. I was unsuccessfully trying to show equivalence with:
O(0.5*(n+m)) ≤ O(max(n,m)) ≤ O(n+m)
My question is: am I really incorrect?
For all m, n ≥ 0 it is valid that max(m, n) ≤ m + n → max(m, n) in O(m + n), and m + n ≤ 2max(m, n) → m + n in O(max(m, n)).
Thus O(max(m, n)) = O(m + n).
ADDENDUM: If f belongs O(m + n) then a constant D > 0 exists, that f(n, m) < D * (m + n) for m and n large enough. Thus f(n, m) < 2 D * max(m, n), and O(m + n) must be a subset of O(max(m, n)). The proof of O(max(m, n)) is a subset of O(m + n) is made analogously.
Well you have totally right about O(n+m) is equal to O(max(n,m)),even more precise we can prove Θ(n+m)=Θ(max(n,m) which is more tight and proves your sentence. The mathematical proof is (for both big-O and Θ) very simple and easy to understand with common sense. So since we have a mathematical proof which is a way to say something but in a more well defined and strict way which doesn't leaves any ambiguity.
Though you was (wrongly) told that this is incorrect because if we want to be very precise this is not the appropriate - mathematical way to express that order of max(m,n) is same as m+n. You used the words "is equal" referring to big-O notation but what is the definition of big-O notation?
It is referred to Sets. Saying max(n+m) belongs to O(m+n) is the
most correct way and vice versa m+n belongs to O(max(m,n)). In big O
notation is commonly used and accepted to say m+n = O(max(n,m)).
The problem caused is that you didn't try to refer to the order of a function like f is O(g) but you tried to compare Sets O(f) and O(g).But proving two infinite sets are equal is not easy (and that may confused the interviewer).
We can say Sets A and B are identical(or equal) when contain same elements (we do not try to compare but instead refer to elements they contain so they must be finite). And even identification can't be easily applied when talking about Big O Sets.
Big O of F is used to notate that we are talking about the Set that
contains all functions with order greater or equal than F. How many
functions are there??
Infinite since F+c is contained and c can take infinite values.
How could you say two different Sets are identical(or equal) when they are
infinite ,well it is not that simple.
So I understand what you are thinking that n+m and max(n,m) have same
order but **the right way to express that** is by saying n+m is
O(max(n,m)) and max(n,m)is O(m+n) ( O(m+n) is equal to O(max(m,n))
may better requires a proof).
One more thing, we said that these functions have same order and this is absolutely mathematically correct but when trying to do optimization of an algorithm and you may need to take into account some lower order factors then maybe they give you slightly different results but the asymptotic behavior is proved to be the same.
CONCLUSION
As you can read in wikipedia (and in all cs courses in every university or in every algorithm book) Big O/θ/Ω/ω/ο notations helps us compare functions and find their order of growth and not for Sets of Functions and this is why you were told you were wrong. Though is easy to say O(n^2) is subset of O(n) it is very difficult to compare infinite to say if two sets are identical. Cantor have worked on categorizing infinite sets, for example we know that natural numbers are countable infinite and real numbers are uncountable infinite so real numbers are more than natural numbers even though both are infinite. It is getting very complicating when trying t order and categorize infinite sets and this would be more of a research in maths than a way of comparing functions.
UPDATE
It turns out you could simply prove O(n+m) equals to O(max(n,m)):
for every function F which belongs to O(n+m) this means that there are constant c and k such:
F <= c(n+m) for every n>=k and m>=k
then also stands:
F <= c(n+m)<= 2c*max(n,m)
so F belongs to O(max(n,m)) and as a result O(m+n) is subset of O(max(n,m)).
Now consider F belongs to O(max(n,m)) then there are constants c and k such:
F <= c*max(n+m) for every n>=k and m>=k
and we also have:
F <= c*max(n+m)<=2c(m+n) for every n>=k and m>=k
so there is c'=2c and with same k by definition: F is O(m+n) and as a result O(max(n,m)) is subset of O(n+m). Because we proved O(m+n) is subset of O(max(n,m)) we proved that O(max(m,n)) and O(m+n) are equal and this mathematical proof proves you had totally right without any doubt.
Finally note that proving that m+n is O(max(n,m)) and max(n,m) is O(m+n) doesn't proves immediately that sets are equal (we need a proof for that) as your saying it just proves that functions have same order but we didn't examine the sets. Though it is easy to see (in general case) that if f is O(g) and g is O(F) then you can easily prove in that case the big O sets equality like we did in the previous paragraph.
We'll show by rigorous Big-O analysis that you are indeed correct, given one possible choice of parameter of growth in your analysis. However, this does not necessarily mean that the viewpoint of the interviewer is incorrect, rather that his/her choice of parameter of growth differs. His/her prompt that your answer was outright incorrect, however, is questionable: you've possibly simply used two slightly different approaches to analyzing the asymptotic complexity of std::set_intersection, both leading to the general consensus that the algorithm runs in linear time.
Preparations
Lets start by looking at the reference of std::set_intersection at cppreference (emphasis mine)
http://en.cppreference.com/w/cpp/algorithm/set_intersection
Parameters
first1, last1 - the first range of elements to examine
first2, last2 - the second range of elements to examine
Complexity
At most 2·(N1+N2-1) comparisons, where
N1 = std::distance(first1, last1)
N2 = std::distance(first2, last2)
std::distance itself is naturally linear (worst case: no random access)
std::distance
...
Returns the number of elements between first and last.
We'll proceed to briefly recall the basic of Big-O notation.
Big-O notation
We loosely state the definition of a function or algorithm f being in O(g(n)) (to be picky, O(g(n)) being a set of functions, hence f ∈ O(...), rather than the commonly misused f(n) ∈ O(...)).
If a function f is in O(g(n)), then c · g(n) is an upper
bound on f(n), for some non-negative constant c such that f(n) ≤ c · g(n)
holds, for sufficiently large n (i.e. , n ≥ n0 for some constant
n0).
Hence, to show that f ∈ O(g(n)), we need to find a set of (non-negative) constants (c, n0) that fulfils
f(n) ≤ c · g(n), for all n ≥ n0, (+)
We note, however, that this set is not unique; the problem of finding the constants (c, n0) such that (+) holds is degenerate. In fact, if any such pair of constants exists, there will exist an infinite amount of different such pairs.
We proceed with the Big-O analysis of std::set_intersection, based on the already known worst case number of comparisons of the algorithm (we'll consider one such comparison a basic operation).
Applying Big-O asymptotic analysis to the set_intersection example
Now consider two ranges of elements, say range1 and range2, and assume that the number of elements contained in these two ranges are m and n, respectively.
Note! Already at this initial stage of the analys do we make a choice: we choose to study the problem in terms of two different parameters of growth (or rather, focusing on the largest one of these two). As we shall see ahead, this will lead to the same asymptotic complexity as the one stated by the OP. However, we could just as well choose to let k = m+n be the parameter of choice: we would still conclude that std::set_intersection is of linear-time complexity, but rather in terms of k (which is m+n which is not max(m, n)) than the largest of m and n. These are simply the preconditions we freely choose to set prior to proceeding with our Big-O notation/asymptotic analysis, and it's quite possibly that the interviewer had a preference of choosing to analyze the complexity using k as parameter of growth rather than the largest of its two components.
Now, from above we know that as worst case, std::set_intersection will run 2 · (m + n - 1) comparisons/basic operations. Let
h(n, m) = 2 · (m + n - 1)
Since the goal is to find an expression of the asymptotic complexity in terms of Big-O (upper bound), we may, without loss of generality, assume that n > m, and define
f(n) = 2 · (n + n - 1) = 4n - 2 > h(n,m) (*)
We proceed to analyze the asymptotic complexity of f(n), in terms of Big-O notation. Let
g(n) = n
and note that
f(n) = 4n - 2 < 4n = 4 · g(n)
Now (choose to) let c = 4 and n0 = 1, and we can state the fact that:
f(n) < 4 · g(n) = c · g(n), for all n ≥ n0, (**)
Given (**), we know from (+) that we've now shown that
f ∈ O(g(n)) = O(n)
Furthermore, since `(*) holds, naturally
h ∈ O(g(n)) = O(n), assuming n > m (i)
holds.
If we switch our initial assumption and assume that m > n, re-tracing the analysis above will, conversely, yield the similar result
h ∈ O(g(m)) = O(m), assuming m > n (ii)
Conclusion
Hence, given two ranges range1 and range2 holding m and n elements, respectively, we've shown that the asymptotic complexity of std::set_intersection applied two these two ranges is indeed
O(max(m, n))
where we're chosen the largest of m and n as the parameter of growth of our analysis.
This is, however, not really valid annotation (at least not common) when speaking about Big-O notation. When we use Big-O notation to describe the asymptotic complexity of some algorithm or function, we do so with regard to some single parameter of growth (not two of them).
Rather than answering that the complexity is O(max(m, n)) we may, without loss of generality, assume that n describes the number of elements in the range with the most elements, and given that assumption, simply state than an upper bound for the asymptotic complexity of std::set_intersection is O(n) (linear time).
A speculation as to the interview feedback: as mentioned above, it's possible that the interviewer simply had a firm view that the Big-O notation/asymptotic analysis should've been based on k = m+n as parameter of growth rather than the largest of its two components. Another possibility could, naturally, be that the interviewer simply confusingly queried about the worst case of actual number of comparisons of std::set_intersection, while mixing this with the separate matter of Big-O notation and asymptotic complexity.
Final remarks
Finally note that the analysis of worst case complexity of std::set_intersect is not at all representative for the commonly studied non-ordered set intersection problem: the former is applied to ranges that are already sorted (see quote from Boost's set_intersection below: the origin of std::set_intersection), whereas in the latter, we study the computation of the intersection of non-ordered collections.
Boost: set_intersection
Description
set_intersection constructs a sorted range that is the intersection
of the sorted ranges rng1 and rng2. The return value is the
end of the output range.
As an example of the latter, the Intersection set method of Python applies to non-ordered collections, and is applied to say sets s and t, it has an average case and a worst-case complexity of O(min(len(s), len(t)) and O(len(s) * len(t)), respectively. The huge difference between average and worst case in this implementation stems from the fact that hash based solutions generally works very well in practice, but can, for some applications, theoretically have a very poor worst-case performance.
For additional details of the latter problem, see e.g.
Intersection of two unsorted sets or lists # SE-CSTheory

Checking containment using integer programming

For this question, a region is a subset of Zd defined by finitely many linear inequalities with integer coefficients, where Zd is the set of d-tuples of integers. For example, the set of pairs (x, y) of non-negative integers with 2x+3y >= 10 constitutes a region with d=2 (non-negativity just imposes the additional inequalities x>=0 and y>=0).
Question: is there a good way, using integer programming (or something else?), to check if one region is contained in a union of finitely many other regions?
I know one way to check containment, which I describe below, but I'm hoping someone may be able to offer some improvements, as it's not too efficient.
Here's the way I know to check containment. First, integer programming libraries can directly check if a region is empty: in integer programming terminology (as I understand it), emptiness of a region corresponds to infeasibility of a model. I have coded up something using the gurobi library to check emptiness, and it seems to work well in practice for the kind of regions I care about.
Suppose now that we want to check if a region X is contained in another region Y (a special case of the question). Let Z be the intersection of X with the complement of Y. Then X is contained in Y if and only if Z is empty. Now, Z itself is not a region in my sense of the word, but it is a union of regions Z_1, ..., Z_n, where n is the number of inequalities used to define Y. We can check if Z is empty by checking that each of Z_1, ..., Z_n is empty, and we can do this as described above.
The general case can be handled in exactly the same way: if Y is a finite union of regions Y_1, ..., Y_k then Z is still a finite union of regions Z_1, ..., Z_n, and so we just check that each Z_i is empty. If Y_i is defined by m_i inequalities then n = m_1 * m_2 * ... * m_k.
So to summarize, we can reduce the containment problem to the emptiness problem, which the library can solve directly. The issue is that we may have to solve a very large number of emptiness problems to solve containment (e.g., if each Y_i is defined by only two inequalities then n = 2^k grows exponentially with k), and so this may take a lot of time.
You can't really expect a simple answer. Suppose that A is defined by all constraints of the form 0 <= x_i <= 1. A can be thought of as the collection of all possible rows of a truth table. Given any logical expression of the form e.g. x or (not y) or z, you can express it as a linear inequality
such as x + (1-y) + z >= 1 (along with the 0-1 constraints). Using this approach, any Boolean formula in conjunctive normal form (CNF) can be expressed as a region in Z^n. If A is defined as above and B_1, B_2, ...., B_k is a list of regions corresponding to CNFs then A is contained in the union of the B_i if and only if the disjunction of those CNFs is a tautology. But tautology-checking is a canonical example of an NP-complete problem.
None of this is to say that it can't be usefully reduced to ILP (which itself is NP-complete). I don't see of any direct way to do so, though I suspect that some of the techniques used to identify redundant constraints would be relevant.

What is the relationship between loop invariant and weakest precondition

Given a loop invariant, Wikipedia lists, a nice way to produce the weakest preconditions for a loop
(from http://en.wikipedia.org/wiki/Predicate_transformer_semantics):
wp(while E inv I do S, R) =
I \wedge
\forall y. ((E \wedge I) \implies wp(S,I \wedge x < y))[x <- y] \wedge
\forall y. ((\neg E \wedge I) \implies R)[x <- y]
where y is a tuple of fresh variables.
M[x <- N] substitutes all the occurrences of x in M with N.
Now, my problem is the variable y. The \forall y, binds y in the expression, so "y is a tuple of fresh variables" doesn't parse for me. Is the y bound in the \forall, the same as the y in "[x <- y]"? I simply cannot parse the above.
Edit: Rephrased to avoid reference request.
My question is: what the direct connection between loop invariants and computing weakest preconditions, if any. It seems a lot of things done in practice relax weakest precondition for loops to a precondition that is suitable for verification. The above from wikipedia suggests that given a loop invariant, one can indeed compute the weakest preconditions on the nose, but I am having trouble understanding this condition.
The syntax “x <- y” in the rule that you quote means the simultaneous substitution of several variables that we can assume to be named x1, x2, … xn respectively by other variables y1, y2, … yn which, as you point out, are bound immediately above in the formula by a \forall.
The way to apply the rule in practice is to pick a set of variables occurring in the predicate R. The number and the name of these variables is left to the choice of the person who applies the rule, but it must be possible to define a well-founded relation < between tuples of the chosen arity such that \forall y. ((E \wedge I) \implies wp(S,I \wedge x < y))[x <- y] will be provable eventually.
This is what the wikipedia article means when it says “Weakest-precondition of while-loop is usually parametrized by a predicate I called loop invariant, and a well-founded relation on the space state denoted < and called loop variant.” It is not just I that must have been chosen in advance and must decorate the program, there is also the choice of a number of variables of the program modified in the body of the loop S and occurring in the condition E, and the existence of the well-founded order < between the tuples of values of these variables guarantee that the condition E is false eventually.
This is much easier to understand with actual verification systems in which it is possible to try out things. Read this tutorial up to section 2.3 Checking Termination and see if the practical version of the same explanation makes more sense to you.

fmincon : impose vector greater than zero constraint

How do you impose a constraint that all values in a vector you are trying to optimize for are greater than zero, using fmincon()?
According to the documentation, I need some parameters A and b, where A*x ≤ b, but I think if I make A a vector of -1's and b 0, then I will have optimized for the sum of x>0, instead of each value of x greater than 0.
Just in case you need it, here is my code. I am trying to optimize over a vector (x) such that the (componentwise) product of x and a matrix (called multiplierMatrix) makes a matrix for which the sum of the columns is x.
function [sse] = myfun(x) % this is a nested function
bigMatrix = repmat(x,1,120) .* multiplierMatrix;
answer = sum(bigMatrix,1)';
sse = sum((expectedAnswer - answer).^2);
end
xGuess = ones(1:120,1);
[sse xVals] = fmincon(#myfun,xGuess,???);
Let me know if I need to explain my problem better. Thanks for your help in advance!
You can use the lower bound:
xGuess = ones(120,1);
lb = zeros(120,1);
[sse xVals] = fmincon(#myfun,xGuess, [],[],[],[], lb);
note that xVals and sse should probably be swapped (if their name means anything).
The lower bound lb means that elements in your decision variable x will never fall below the corresponding element in lb, which is what you are after here.
The empties ([]) indicate you're not using linear constraints (e.g., A,b, Aeq,beq), only the lower bounds lb.
Some advice: fmincon is a pretty advanced function. You'd better memorize the documentation on it, and play with it for a few hours, using many different example problems.