minimum-difference constrained sparse least squares problem (called from python) - least-squares

I'm struggling a bit finding a fast algorithm that's suitable.
I just want to minimize:
norm2(x-s)
st
G.x <= h
x >= 0
sum(x) = R
G is sparse and contains only 1s (and zeros obviously).
In the case of iterative algorithms, it would be nice to get the interim solutions to show to the user.
The context is that s is a vector of current results, and the user is saying "well the sum of these few entries (entries indicated by a few 1.0's in a row in G) should be less than this value (a row in h). So we have to remove quantities from the entries the user specified (indicated by 1.0 entries in G) in a least-squares optimal way, but since we have a global constraint on the total (R) the values removed need to be allocated in a least-squares optimal way amongst the other entries. The entries can't go negative.
All the algorithms I'm looking at are much more general, and as a result are much more complex. Also, they seem quite slow. I don't see this as a complex problem, although mixes of equality and inequality constraints always seem to make things more complex.
This has to be called from Python, so I'm looking at Python libraries like qpsolvers and scipy.optimize. But I suppose Java or C++ libraries could be used and called from Python, which might be good since multithreading is better in Java and C++.
Any thoughts on what library/package/approach to use to best solve this problem?
The size of the problem is about 150,000 rows in s, and a few dozen rows in G.
Thanks!

Your problem is a linear least squares:
minimize_x norm2(x-s)
such that G x <= h
x >= 0
1^T x = R
Thus it fits the bill of the solve_ls function in qpsolvers.
Here is an instance of how I imagine your problem matrices would look like, given what you specified. Since it is sparse we should use SciPy CSC matrices, and regular NumPy arrays for vectors:
import numpy as np
import scipy.sparse as spa
n = 150_000
# minimize 1/2 || x - s ||^2
R = spa.eye(n, format="csc")
s = np.array(range(n), dtype=float)
# such that G * x <= h
G = spa.diags(
diagonals=[
[1.0 if i % 2 == 0 else 0.0 for i in range(n)],
[1.0 if i % 3 == 0 else 0.0 for i in range(n - 1)],
[1.0 if i % 5 == 0 else 0.0 for i in range(n - 1)],
],
offsets=[0, 1, -1],
)
a_dozen_rows = np.linspace(0, n - 1, 12, dtype=int)
G = G[a_dozen_rows]
h = np.ones(12)
# such that sum(x) == 42
A = spa.csc_matrix(np.ones((1, n)))
b = np.array([42.0]).reshape((1,))
# such that x >= 0
lb = np.zeros(n)
Next, we can solve this problem with:
from qpsolvers import solve_ls
x = solve_ls(R, s, G, h, A, b, lb, solver="osqp", verbose=True)
Here I picked CVXOPT but there are other open-source solvers you can install such as ProxQP, OSQP or SCS. You can install a set of open-source solvers by: pip install qpsolvers[open_source_solvers]. After some solvers are installed, you can list those for sparse matrices by:
print(qpsolvers.sparse_solvers)
Finally, here is some code to check that the solution returned by the solver satisfies our constraints:
tol = 1e-6 # tolerance for checks
print(f"- Objective: {0.5 * (x - s).dot(x - s):.1f}")
print(f"- G * x <= h: {(G.dot(x) <= h + tol).all()}")
print(f"- x >= 0: {(x + tol >= 0.0).all()}")
print(f"- sum(x) = {x.sum():.1f}")
I just tried it with OSQP (adding the eps_rel=1e-5 keyword argument when calling solve_ls, otherwise the returned solution would be less accurate than the tol = 1e-6 tolerance) and it found a solution is 737 milliseconds on my (rather old) CPU with:
- Objective: 562494373088866.8
- G * x <= h: True
- x >= 0: True
- sum(x) = 42.0
Hoping this helps. Happy solving!

Related

Vectorize a function with a condition

I would like to vectorize a function with a condition, meaning to calculate its values with array arithmetic. np.vectorize handles vectorization, but it does not work with array arithmetic, so it is not a complete solution
An answer was given as the solution in the question "How to vectorize a function which contains an if statement?" but did not prevent errors here; see the MWE below.
import numpy as np
def myfx(x):
return np.where(x < 1.1, 1, np.arcsin(1 / x))
y = myfx(x)
This runs but raises the following warnings:
<stdin>:2: RuntimeWarning: divide by zero encountered in true_divide
<stdin>:2: RuntimeWarning: invalid value encountered in arcsin
What is the problem, or is there a better way to do this?
I think this could be done by
Getting the indices ks of x for which x[k] > 1.1 for each k in ks.
Applying np.arcsin(1 / x[ks]) to the slice x[ks], and using 1 for the rest of the elements.
Recombining the arrays.
I am not sure about the efficiency, though.
The statement np.where(x < 1.1, 1, np.arcsin(1 / x)) is equivalent to
mask = x < 1.1
a = 1
b = np.arcsin(1 / x)
np.where(mask, a, b)
Notice that you're calling np.arcsin on all the elements of x, regardless of whether 1 / x <= 1 or not. Your basic plan is correct. You can do the operations in-place on an output array using the where keyword of np.arcsin and np.reciprocal, without having to recombine anything:
def myfx(x):
mask = (x >= 1.1)
out = np.ones(x.shape)
np.reciprocal(x, where=mask, out=out) # >= 1.1 implies != 0
return np.arcsin(out, where=mask, out=out)
Using np.ones ensures that the unmasked elements of out are initialized correctly. An equivalent method would be
out = np.empty(x.shape)
out[~mask] = 1
You can always find an arithmetic expression that prevents the "divide by zero".
Example:
def myfx(x):
return np.where( x < 1.1, 1, np.arcsin(1/np.maximum(x, 1.1)) )
The values where x<1.1 in the right wing are not used, so it's not an issue computing np.arcsin(1/1.1) where x < 1.1.

CVXPY constraint: variable == 0 OR variable >= min

I am trying to solve a portfolio optimisation problem with the constraint that weights can be either zero or at least min (a Nx1 vector).
import cvxpy as cp
w = cp.Variable(len(mu))
mins = np.ones(len(mu)) * 0.03
risk = cp.quad_form(w, S)
prob = cp.Problem(cp.Minimize(risk),
[cp.sum(w) == 1,
w >= 0,
w >= min OR w == 0 # pseudocode for my desired constraint]
This is equivalent to a constraint that the weights are NOT 0 < w <= min, but I cannot find a way to express this in CVXPY (I have googled things like "cvxpy OR constraint" to no avail).
It feels like I'm missing something obvious. Perhaps there is a solution involving some boolean vector?
This is called w being a semi-continuous variable. Most advanced solvers support this type of variable directly. As CVXPY does not understand semi-continuous variables, we can use binary variables δ ∈ {0,1} and form the constraints:
δ⋅min ≤ w ≤ δ⋅max
where we can set max=1.
This makes the problem a MIQP (Mixed-Integer Quadratic Programming) problem. This usually means that you need to use a high-end solver that supports this type of model.
Based on Erwin's answer, this is the working code.
import cvxpy as cp
w = cp.Variable(n)
mins = np.ones(n) * 0.03
maxs = np.ones(n)
risk = cp.quad_form(w, S)
prob = cp.Problem(cp.Minimize(risk),
[cp.sum(w) == 1,
w >= 0,
w >= cp.multiply(k, mins),
w <= cp.multiply(k, maxs)])
prob.solve(solver="ECOS_BB")
EDIT: changed k # mins to cp.multiply(k, mins) as per comment

Algorithms for factorizing a 30 decimal digit number

My professor has given me an RSA factoring problem has assignment. The given modulus is 30 decimal digits long. I have been searching a lot about factoring algorithms. But it has been quite a headache to choose one for my given requirements. Which all algorithms give better performance for 30 decimal digit numbers?
Note: So far I have read about Brute force approach and Quadratic Sieve. The latter is complex and the former time consuming.
There's another method called Pollard's Rho algorithm, which is not as fast as the GNFS but is capable of factoring 30-digit numbers in minutes rather than hours.
The algorithm is very simple. It stops when it finds any factor, so you'll need to call it recursively to obtain a complete factorisation. Here's a basic implementation in Python:
def rho(n):
def gcd(a, b):
while b > 0:
a, b = b, a%b
return a
g = lambda z: (z**2 + 1) % n
x, y, d = 2, 2, 1
while d == 1:
x = g(x)
y = g(g(y))
d = gcd(abs(x-y), n)
if d == n:
print("Can't factor this, sorry.")
print("Try a different polynomial for g(), maybe?")
else:
print("%d = %d * %d" % (n, d, n // d))
rho(441693463910910230162813378557) # = 763728550191017 * 578338290221621
Or you could just use an existing software library. I can't see much point in reinventing this particular wheel.

Beginner Finite Elemente Code does not solve equation properly

I am trying to write the code for solving the extremely difficult differential equation:
x' = 1
with the finite element method.
As far as I understood, I can obtain the solution u as
with the basis functions phi_i(x), while I can obtain the u_i as the solution of the system of linear equations:
with the differential operator D (here only the first derivative). As a basis I am using the tent function:
def tent(l, r, x):
m = (l + r) / 2
if x >= l and x <= m:
return (x - l) / (m - l)
elif x < r and x > m:
return (r - x) / (r - m)
else:
return 0
def tent_half_down(l,r,x):
if x >= l and x <= r:
return (r - x) / (r - l)
else:
return 0
def tent_half_up(l,r,x):
if x >= l and x <= r:
return (x - l) / (r - l)
else:
return 0
def tent_prime(l, r, x):
m = (l + r) / 2
if x >= l and x <= m:
return 1 / (m - l)
elif x < r and x > m:
return 1 / (m - r)
else:
return 0
def tent_half_prime_down(l,r,x):
if x >= l and x <= r:
return - 1 / (r - l)
else:
return 0
def tent_half_prime_up(l, r, x):
if x >= l and x <= r:
return 1 / (r - l)
else:
return 0
def sources(x):
return 1
Discretizing my space:
n_vertex = 30
n_points = (n_vertex-1) * 40
space = (0,5)
x_space = np.linspace(space[0],space[1],n_points)
vertx_list = np.linspace(space[0],space[1], n_vertex)
tent_list = np.zeros((n_vertex, n_points))
tent_prime_list = np.zeros((n_vertex, n_points))
tent_list[0,:] = [tent_half_down(vertx_list[0],vertx_list[1],x) for x in x_space]
tent_list[-1,:] = [tent_half_up(vertx_list[-2],vertx_list[-1],x) for x in x_space]
tent_prime_list[0,:] = [tent_half_prime_down(vertx_list[0],vertx_list[1],x) for x in x_space]
tent_prime_list[-1,:] = [tent_half_prime_up(vertx_list[-2],vertx_list[-1],x) for x in x_space]
for i in range(1,n_vertex-1):
tent_list[i, :] = [tent(vertx_list[i-1],vertx_list[i+1],x) for x in x_space]
tent_prime_list[i, :] = [tent_prime(vertx_list[i-1],vertx_list[i+1],x) for x in x_space]
Calculating the system of linear equations:
b = np.zeros((n_vertex))
A = np.zeros((n_vertex,n_vertex))
for i in range(n_vertex):
b[i] = np.trapz(tent_list[i,:]*sources(x_space))
for j in range(n_vertex):
A[j, i] = np.trapz(tent_prime_list[j] * tent_list[i])
And then solving and reconstructing it
u = np.linalg.solve(A,b)
sol = tent_list.T.dot(u)
But it does not work, I am only getting some up and down pattern. What am I doing wrong?
First, a couple of comments on terminology and notation:
1) You are using the weak formulation, though you've done this implicitly. A formulation being "weak" has nothing to do with the order of derivatives involved. It is weak because you are not satisfying the differential equation exactly at every location. FE minimizes the weighted residual of the solution, integrated over the domain. The functions phi_j actually discretize the weighting function. The difference when you only have first-order derivatives is that you don't have to apply the Gauss divergence theorem (which simplifies to integration by parts for one dimension) to eliminate second-order derivatives. You can tell this wasn't done because phi_j is not differentiated in the LHS.
2) I would suggest not using "A" as the differential operator. You also use this symbol for the global system matrix, so your notation is inconsistent. People often use "D", since this fits better to the idea that it is used for differentiation.
Secondly, about your implementation:
3) You are using way more integration points than necessary. Your elements use linear interpolation functions, which means you only need one integration point located at the center of the element to evaluate the integral exactly. Look into the details of Gauss quadrature to see why. Also, you've specified the number of integration points as a multiple of the number of nodes. This should be done as a multiple of the number of elements instead (in your case, n_vertex-1), because the elements are the domains on which you're integrating.
4) You have built your system by simply removing the two end nodes from the formulation. This isn't the correct way to specify boundary conditions. I would suggesting building the full system first and using one of the typical methods for applying Dirichlet boundary conditions. Also, think about what constraining two nodes would imply for the differential equation you're trying to solve. What function exists that satisfies x' = 1, x(0) = 0, x(5) = 0? You have overconstrained the system by trying to apply 2 boundary conditions to a first-order differential equation.
Unfortunately, there isn't a small tweak that can be made to get the code to work, but I hope the comments above help you rethink your approach.
EDIT to address your changes:
1) Assuming the matrix A is addressed with A[row,col], then your indices are backwards. You should be integrating with A[i,j] = ...
2) A simple way to apply a constraint is to replace one row with the constraint desired. If you want x(0) = 0, for example, set A[0,j] = 0 for all j, then set A[0,0] = 1 and set b[0] = 0. This substitutes one of the equations with u_0 = 0. Do this after integrating.

Coordinate Descent Algorithm in Julia for Least Squares not converging

As a warm-up to writing my own elastic net solver, I'm trying to get a fast enough version of ordinary least squares implemented using coordinate descent.
I believe I've implemented the coordinate descent algorithm correctly, but when I use the "fast" version (see below), the algorithm is insanely unstable, outputting regression coefficients that routinely overflow a 64-bit float when the number of features is of moderate size compared to the number of samples.
Linear Regression and OLS
If b = A*x, where A is a matrix, x a vector of the unknown regression coefficients, and y is the output, I want to find x that minimizes
||b - Ax||^2
If A[j] is the jth column of A and A[-j] is A without column j, and the columns of A are normalized so that ||A[j]||^2 = 1 for all j, the coordinate-wise update is then
Coordinate Descent:
x[j] <-- A[j]^T * (b - A[-j] * x[-j])
I'm following along with these notes (page 9-10) but the derivation is simple calculus.
It's pointed out that instead of recomputing A[j]^T(b - A[-j] * x[-j]) all the time, a faster way to do it is with
Fast Coordinate Descent:
x[j] <-- A[j]^T*r + x[j]
where the total residual r = b - Ax is computed outside the loop over coordinates. The equivalence of these update rules follows from noting that Ax = A[j]*x[j] + A[-j]*x[-j] and rearranging terms.
My problem is that while the second method is indeed faster, it's wildly numerically unstable for me whenever the number of features isn't small compared to the number of samples. I was wondering if anyone might have some insight as to why that's the case. I should note that the first method, which is more stable, still starts disagreeing with more standard methods as the number of features approaches the number of samples.
Julia code
Below is some Julia code for the two update rules:
function OLS_builtin(A,b)
x = A\b
return(x)
end
function OLS_coord_descent(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
for j = 1:P
x[j] = dot(A[:,j], b - A[:,1:P .!= j]*x[1:P .!= j])
end
end
return(x)
end
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
end
end
return(x)
end
Example of the problem
I generate data with the following:
n = 100
p = 50
σ = 0.1
β_nz = float([i*(-1)^i for i in 1:10])
β = append!(β_nz,zeros(Float64,p-length(β_nz)))
X = randn(n,p); X .-= mean(X,1); X ./= sqrt(sum(abs2(X),1))
y = X*β + σ*randn(n); y .-= mean(y);
Here I use p=50, and I get good agreement between OLS_coord_descent(X,y) and OLS_builtin(X,y), whereas OLS_coord_descent_fast(X,y)returns exponentially large values for the regression coefficients.
When p is less than about 20, OLS_coord_descent_fast(X,y) agrees with the other two.
Conjecture
Since things agrees for the regime of p << n, I think the algorithm is formally correct, but numerically unstable. Does anyone have any thoughts on whether this guess is correct, and if so how to correct for the instability while retaining (most) of the performance gains of the fast version of the algorithm?
The quick answer: You forgot to update r after each x[j] update. Following is the fixed function which behaves like OLS_coord_descent:
function OLS_coord_descent_fast(A,b)
N,P = size(A)
x = zeros(P)
for cycle in 1:1000
r = b - A*x
for j = 1:P
x[j] += dot(A[:,j],r)
r -= A[:,j]*dot(A[:,j],r) # Add this line
end
end
return(x)
end