Related
I'm struggling a bit finding a fast algorithm that's suitable.
I just want to minimize:
norm2(x-s)
st
G.x <= h
x >= 0
sum(x) = R
G is sparse and contains only 1s (and zeros obviously).
In the case of iterative algorithms, it would be nice to get the interim solutions to show to the user.
The context is that s is a vector of current results, and the user is saying "well the sum of these few entries (entries indicated by a few 1.0's in a row in G) should be less than this value (a row in h). So we have to remove quantities from the entries the user specified (indicated by 1.0 entries in G) in a least-squares optimal way, but since we have a global constraint on the total (R) the values removed need to be allocated in a least-squares optimal way amongst the other entries. The entries can't go negative.
All the algorithms I'm looking at are much more general, and as a result are much more complex. Also, they seem quite slow. I don't see this as a complex problem, although mixes of equality and inequality constraints always seem to make things more complex.
This has to be called from Python, so I'm looking at Python libraries like qpsolvers and scipy.optimize. But I suppose Java or C++ libraries could be used and called from Python, which might be good since multithreading is better in Java and C++.
Any thoughts on what library/package/approach to use to best solve this problem?
The size of the problem is about 150,000 rows in s, and a few dozen rows in G.
Thanks!
Your problem is a linear least squares:
minimize_x norm2(x-s)
such that G x <= h
x >= 0
1^T x = R
Thus it fits the bill of the solve_ls function in qpsolvers.
Here is an instance of how I imagine your problem matrices would look like, given what you specified. Since it is sparse we should use SciPy CSC matrices, and regular NumPy arrays for vectors:
import numpy as np
import scipy.sparse as spa
n = 150_000
# minimize 1/2 || x - s ||^2
R = spa.eye(n, format="csc")
s = np.array(range(n), dtype=float)
# such that G * x <= h
G = spa.diags(
diagonals=[
[1.0 if i % 2 == 0 else 0.0 for i in range(n)],
[1.0 if i % 3 == 0 else 0.0 for i in range(n - 1)],
[1.0 if i % 5 == 0 else 0.0 for i in range(n - 1)],
],
offsets=[0, 1, -1],
)
a_dozen_rows = np.linspace(0, n - 1, 12, dtype=int)
G = G[a_dozen_rows]
h = np.ones(12)
# such that sum(x) == 42
A = spa.csc_matrix(np.ones((1, n)))
b = np.array([42.0]).reshape((1,))
# such that x >= 0
lb = np.zeros(n)
Next, we can solve this problem with:
from qpsolvers import solve_ls
x = solve_ls(R, s, G, h, A, b, lb, solver="osqp", verbose=True)
Here I picked CVXOPT but there are other open-source solvers you can install such as ProxQP, OSQP or SCS. You can install a set of open-source solvers by: pip install qpsolvers[open_source_solvers]. After some solvers are installed, you can list those for sparse matrices by:
print(qpsolvers.sparse_solvers)
Finally, here is some code to check that the solution returned by the solver satisfies our constraints:
tol = 1e-6 # tolerance for checks
print(f"- Objective: {0.5 * (x - s).dot(x - s):.1f}")
print(f"- G * x <= h: {(G.dot(x) <= h + tol).all()}")
print(f"- x >= 0: {(x + tol >= 0.0).all()}")
print(f"- sum(x) = {x.sum():.1f}")
I just tried it with OSQP (adding the eps_rel=1e-5 keyword argument when calling solve_ls, otherwise the returned solution would be less accurate than the tol = 1e-6 tolerance) and it found a solution is 737 milliseconds on my (rather old) CPU with:
- Objective: 562494373088866.8
- G * x <= h: True
- x >= 0: True
- sum(x) = 42.0
Hoping this helps. Happy solving!
So there was a puzzle:
This equation is incomplete: 1 2 3 4 5 6 7 8 9 = 100. One way to make
it accurate is by adding seven plus and minus signs, like so: 1 + 2 +
3 – 4 + 5 + 6 + 78 + 9 = 100.
How can you do it using only 3 plus or minus signs?
I'm quite new to Prolog, solved the puzzle, but i wonder how to optimize it
makeInt(S,F,FinInt):-
getInt(S,F,0,FinInt).
getInt(Start, Finish, Acc, FinInt):-
0 =< Finish - Start,
NewAcc is Acc*10 + Start,
NewStart is Start +1,
getInt(NewStart, Finish, NewAcc, FinInt).
getInt(Start, Finish, A, A):-
0 > Finish - Start.
itCounts(X,Y,Z,Q):-
member(XLastDigit,[1,2,3,4,5,6]),
FromY is XLastDigit+1,
numlist(FromY, 7, ListYLastDigit),
member(YLastDigit, ListYLastDigit),
FromZ is YLastDigit+1,
numlist(FromZ, 8, ListZLastDigit),
member(ZLastDigit,ListZLastDigit),
FromQ is ZLastDigit+1,
member(YSign,[-1,1]),
member(ZSign,[-1,1]),
member(QSign,[-1,1]),
0 is XLastDigit + YSign*YLastDigit + ZSign*ZLastDigit + QSign*9,
makeInt(1, XLastDigit, FirstNumber),
makeInt(FromY, YLastDigit, SecondNumber),
makeInt(FromZ, ZLastDigit, ThirdNumber),
makeInt(FromQ, 9, FourthNumber),
X is FirstNumber,
Y is YSign*SecondNumber,
Z is ZSign*ThirdNumber,
Q is QSign*FourthNumber,
100 =:= X + Y + Z + Q.
Not sure this stands for an optimization. The code is just shorter:
sum_123456789_eq_100_with_3_sum_or_sub(L) :-
append([G1,G2,G3,G4], [0'1,0'2,0'3,0'4,0'5,0'6,0'7,0'8,0'9]),
maplist([X]>>(length(X,N), N>0), [G1,G2,G3,G4]),
maplist([G,F]>>(member(Op, [0'+,0'-]),F=[Op|G]), [G2,G3,G4], [F2,F3,F4]),
append([G1,F2,F3,F4], L),
read_term_from_codes(L, T, []),
100 is T.
It took me a while, but I got what your code is doing. It's something like this:
itCounts(X,Y,Z,Q) :- % generate X, Y, Z, and Q s.t. X+Y+Z+Q=100, etc.
generate X as a list of digits
do the same for Y, Z, and Q
pick the signs for Y, Z, and Q
convert all those lists of digits into numbers
verify that, with the signs, they add to 100.
The inefficiency here is that the testing is all done at the last minute. You can improve the efficiency if you can throw out some possible solutions as soon as you pick one of your numbers, that is, testing earlier.
itCounts(X,Y,Z,Q) :- % generate X, Y, Z, and Q s.t. X+Y+Z+Q=100, etc.
generate X as a list of digits, and convert it to a number
if it's so big or small the rest can't possibly bring the sum back to 100, fail
generate Y as a list of digits, convert to number, and pick it sign
if it's so big or so small the rest can't possibly bring the sum to 100, fail
do the same for Z
do the same for Q
Your function is running pretty fast already, even if I search all possible solutions. It only picks 6 X's; 42 Y's; 224 Z's; and 15 Q's. I don't think optimizing will be worth your while.
But if you really wanted to: I tested this by putting a testing function immediately after selecting an X. It reduced the 6 X's to 3 (all before finding the solution); 42 Y's to 30; 224 Z's to 184; and 15 Q's to 11. I believe we could reduce it further by testing immediately after a Y is picked, to see whether X YSign Y is already so large or small there can be no solution.
In PROLOG programs that are more computationally intensive, putting parts of the 'test' earlier in 'generate and test' algorithms can help a lot.
Suppose we have N (in this example N = 3) events that can happen depending on some variables. Each of them can generate certain profit or loses (event1 = 300, event2 = -100, event3 = 200), they are constrained by rules when they happen.
event 1 happens only when x > 5,
event 2 happens only when x = 2 and y = 3
event 3 happens only when x is odd.
The problem is to know the maximum profit.
Assume x, y are integer numbers >= 0
In the real problem there are many events and many dimensions.
(the solution should not be specific)
My question is:
Is this linear programming problem? If yes please provide solution to the example problem using this approach. If no please suggest some algorithms to optimize such problem.
This can be formulated as a mixed integer linear program. This is a linear program where some of the variables are constrained to be integer. Contrary to linear programs, solving the general integer program is NP-hard. However, there are many commercial or open source solvers that can solve efficiently large-scale problems. For up to 300 variables and constraints, you can use excel's solver.
Here is a way to formulate the above constraints:
If you go down this route, you might find this document useful.
the last constraint in an interesting one. I am assuming that x has to be integer, but if x can be either integer or continuous I will edit the answer accordingly.
I hope this helps!
Edit: L and U above should be interpreted as L1 and U1.
Edit 2: z2 needs to changed to (1-z2) on the 3rd and 4th constraint.
A specific answer:
seems more like a mathematical calculation than a programming problem, can't you just run a loop for x= 1->1000 to see what results occur?
for the example:
as x = 2 or 3 = -200 then x > 2 or 3, and if x < 5 doesn't get the 300, so all that is really happening is x > 5 and x = odd = maximum results.
x = 7 = 300 + 200 . = maximum profit for x
A general answer:
I don't see how to answer the question without seeing what the events are and how the events effect X ? Weather it's a linear or functional (mathematical) answer seems rather beside the point of finding the desired solution.
The following inequality system is solved for x1 and x2 over the integers.
x1 + x2 = l
x1 >= y1
x2 >= y2
x1 <= z1
x2 <= z2
l - z1 <= x2
l - z2 <= x1
l,y1,y2,z1,z2 are arbitrary but fixed and >= 0.
With the example values
l = 8
y1 = 1
y2 = 2
z1 = z2 = 6
I solve the system and get the following equations:
2 <= x1 <= 6
x2 = 8 - x1
When I tell WolframAlpha that it should solve it over the integers, it only outputs all possible values.
My question is whether I can derive equations/ranges for x1 and x2 for any given l,y1,y2,z1,z2 programmatically. This problem is related to constraint programming and I found an old paper about this problem: "Compiling Constraint Solving using Projection" by Harvey et al.
Is this approach used in any modern constraint solving libraries?
The reason I ask this is that I need to solve systems like the above several thousand times with different parameters and this takes a long time if the whole system is read/optimized/solved over and over again. Therefore, if I could compile my parameterized systems once and then just use the compiled versions I expect a massive speed gain.
I am processing a series of points which all have the same Y value, but different X values. I go through the points by incrementing X by one. For example, I might have Y = 50 and X is the integers from -30 to 30. Part of my algorithm involves finding the distance to the origin from each point and then doing further processing.
After profiling, I've found that the sqrt call in the distance calculation is taking a significant amount of my time. Is there an iterative way to calculate the distance?
In other words:
I want to efficiently calculate: r[n] = sqrt(x[n]*x[n] + y*y)). I can save information from the previous iteration. Each iteration changes by incrementing x, so x[n] = x[n-1] + 1. I can not use sqrt or trig functions because they are too slow except at the beginning of each scanline.
I can use approximations as long as they are good enough (less than 0.l% error) and the errors introduced are smooth (I can't bin to a pre-calculated table of approximations).
Additional information:
x and y are always integers between -150 and 150
I'm going to try a couple ideas out tomorrow and mark the best answer based on which is fastest.
Results
I did some timings
Distance formula: 16 ms / iteration
Pete's interperlating solution: 8 ms / iteration
wrang-wrang pre-calculation solution: 8ms / iteration
I was hoping the test would decide between the two, because I like both answers. I'm going to go with Pete's because it uses less memory.
Just to get a feel for it, for your range y = 50, x = 0 gives r = 50 and y = 50, x = +/- 30 gives r ~= 58.3. You want an approximation good for +/- 0.1%, or +/- 0.05 absolute. That's a lot lower accuracy than most library sqrts do.
Two approximate approaches - you calculate r based on interpolating from the previous value, or use a few terms of a suitable series.
Interpolating from previous r
r = ( x2 + y2 ) 1/2
dr/dx = 1/2 . 2x . ( x2 + y2 ) -1/2 = x/r
double r = 50;
for ( int x = 0; x <= 30; ++x ) {
double r_true = Math.sqrt ( 50*50 + x*x );
System.out.printf ( "x: %d r_true: %f r_approx: %f error: %f%%\n", x, r, r_true, 100 * Math.abs ( r_true - r ) / r );
r = r + ( x + 0.5 ) / r;
}
Gives:
x: 0 r_true: 50.000000 r_approx: 50.000000 error: 0.000000%
x: 1 r_true: 50.010000 r_approx: 50.009999 error: 0.000002%
....
x: 29 r_true: 57.825065 r_approx: 57.801384 error: 0.040953%
x: 30 r_true: 58.335225 r_approx: 58.309519 error: 0.044065%
which seems to meet the requirement of 0.1% error, so I didn't bother coding the next one, as it would require quite a bit more calculation steps.
Truncated Series
The taylor series for sqrt ( 1 + x ) for x near zero is
sqrt ( 1 + x ) = 1 + 1/2 x - 1/8 x2 ... + ( - 1 / 2 )n+1 xn
Using r = y sqrt ( 1 + (x/y)2 ) then you're looking for a term t = ( - 1 / 2 )n+1 0.36n with magnitude less that a 0.001, log ( 0.002 ) > n log ( 0.18 ) or n > 3.6, so taking terms to x^4 should be Ok.
Y=10000
Y2=Y*Y
for x=0..Y2 do
D[x]=sqrt(Y2+x*x)
norm(x,y)=
if (y==0) x
else if (x>y) norm(y,x)
else {
s=Y/y
D[round(x*s)]/s
}
If your coordinates are smooth, then the idea can be extended with linear interpolation. For more precision, increase Y.
The idea is that s*(x,y) is on the line y=Y, which you've precomputed distances for. Get the distance, then divide it by s.
I assume you really do need the distance and not its square.
You may also be able to find a general sqrt implementation that sacrifices some accuracy for speed, but I have a hard time imagining that beating what the FPU can do.
By linear interpolation, I mean to change D[round(x)] to:
f=floor(x)
a=x-f
D[f]*(1-a)+D[f+1]*a
This doesn't really answer your question, but may help...
The first questions I would ask would be:
"do I need the sqrt at all?".
"If not, how can I reduce the number of sqrts?"
then yours: "Can I replace the remaining sqrts with a clever calculation?"
So I'd start with:
Do you need the exact radius, or would radius-squared be acceptable? There are fast approximatiosn to sqrt, but probably not accurate enough for your spec.
Can you process the image using mirrored quadrants or eighths? By processing all pixels at the same radius value in a batch, you can reduce the number of calculations by 8x.
Can you precalculate the radius values? You only need a table that is a quarter (or possibly an eighth) of the size of the image you are processing, and the table would only need to be precalculated once and then re-used for many runs of the algorithm.
So clever maths may not be the fastest solution.
Well there's always trying optimize your sqrt, the fastest one I've seen is the old carmack quake 3 sqrt:
http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/
That said, since sqrt is non-linear, you're not going to be able to do simple linear interpolation along your line to get your result. The best idea is to use a table lookup since that will give you blazing fast access to the data. And, since you appear to be iterating by whole integers, a table lookup should be exceedingly accurate.
Well, you can mirror around x=0 to start with (you need only compute n>=0, and the dupe those results to corresponding n<0). After that, I'd take a look at using the derivative on sqrt(a^2+b^2) (or the corresponding sin) to take advantage of the constant dx.
If that's not accurate enough, may I point out that this is a pretty good job for SIMD, which will provide you with a reciprocal square root op on both SSE and VMX (and shader model 2).
This is sort of related to a HAKMEM item:
ITEM 149 (Minsky): CIRCLE ALGORITHM
Here is an elegant way to draw almost
circles on a point-plotting display:
NEW X = OLD X - epsilon * OLD Y
NEW Y = OLD Y + epsilon * NEW(!) X
This makes a very round ellipse
centered at the origin with its size
determined by the initial point.
epsilon determines the angular
velocity of the circulating point, and
slightly affects the eccentricity. If
epsilon is a power of 2, then we don't
even need multiplication, let alone
square roots, sines, and cosines! The
"circle" will be perfectly stable
because the points soon become
periodic.
The circle algorithm was invented by
mistake when I tried to save one
register in a display hack! Ben Gurley
had an amazing display hack using only
about six or seven instructions, and
it was a great wonder. But it was
basically line-oriented. It occurred
to me that it would be exciting to
have curves, and I was trying to get a
curve display hack with minimal
instructions.