Mathematica exponentiation and finding a specified coefficient - optimization
I have the following code, and it does exactly what I want it to do, except that it is ridiculously slow. I would not be so bothered, except that when I process the code "manually", i.e., I break it into parts and do them individually, it's near instantaneous.
Here is my code:
Coefficient[Product[Sum[x^(j*Prime[i]), {j, 0, Floor[q/Prime[i]]}],
{i, 1, PrimePi[q]}], x, q]
Picture added for clarity:
I think it is trying to optimize the sum, but am not sure. Is there a way to stop that?
In addition, since all my coefficients are positive, and I only want the x^qth one, is there a way to get Mathematica to discard all exponents that are larger than that and not do all the multiplication with those?
I may be misunderstanding what you want but, as the coefficient will depend on q, I assume you want it evaluated for specific q. Since I suspected (like you) that the time is taken to optimise the produt and sum, I rewrote it. You had something like:
With[{q = 80}, Coefficient[\!\(
\*UnderoverscriptBox[\(\[Product]\), \(i = 1\), \(PrimePi[q]\)]\((
\*UnderoverscriptBox[\(\[Sum]\), \(j = 0\), \(\[LeftFloor]
\*FractionBox[\(q\), \(Prime[i]\)]\[RightFloor]\)]
\*SuperscriptBox[\(x\), \(j*Prime[i]\)])\)\), x, q]] // Timing
(*
-> {8.36181, 10003}
*)
which I rewrote with purely structural operations as
With[{q = 80},
Coefficient[Times ##
Table[Plus ## Table[x^(j*Prime[i]), {j, 0, Floor[q/Prime[i]]}],
{i, 1, PrimePi[q]}], x, q]] // Timing
(*
-> {8.36357, 10003}
*)
(this just builds up a list of the terms and then multiplies them, so no symbolic analysis is performed).
Just building up the polynomial is instantaneous, but it has a few thousand terms, so what is probably happening is that Coefficient spends a lot of time to make sure it has the right coefficient. Actually you can solve this by Expanding the polynomial. Thus:
With[{q = 80}, Coefficient[Expand[\!\(
\*UnderoverscriptBox[\(\[Product]\), \(i = 1\), \(PrimePi[q]\)]\((
\*UnderoverscriptBox[\(\[Sum]\), \(j = 0\), \(\[LeftFloor]
\*FractionBox[\(q\), \(Prime[i]\)]\[RightFloor]\)]
\*SuperscriptBox[\(x\), \(j*Prime[i]\)])\)\)], x, q]] // Timing
(*
-> {0.240862, 10003}
*)
and it also works for my method.
So to summarise, just stick Expand in front of the expression and before you take the coefficient.
I think that the reason that the original code is slow is because Coefficient is made to work even with very large expressions - ones that would not fit into the memory if naively expanded.
Here's the original polynomial:
poly[q_, x_] := Product[Sum[ x^(j*Prime[i]),
{j, 0, Floor[q/Prime[i]]}], {i, 1, PrimePi[q]}]
See how for not too large q, expanding the polynomial takes up a lot more memory and becomes fairly slow:
In[2]:= Through[{LeafCount, ByteCount}[poly[300, x]]] // Timing
Through[{LeafCount, ByteCount}[Expand#poly[300, x]]] // Timing
Out[2]= { 0.01, { 1859, 55864}}
Out[3]= {25.27, {77368, 3175840}}
Now let's define the coefficient in 3 different ways and time them
coeff[q_] := Module[{x}, Coefficient[poly[q, x], x, q]]
exCoeff[q_] := Module[{x}, Coefficient[Expand#poly[q, x], x, q]]
serCoeff[q_] := Module[{x}, SeriesCoefficient[poly[q, x], {x, 0, q}]]
In[7]:= Table[ coeff[q],{q,1,30}]//Timing
Table[ exCoeff[q],{q,1,30}]//Timing
Table[serCoeff[q],{q,1,30}]//Timing
Out[7]= {0.37,{0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}}
Out[8]= {0.12,{0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}}
Out[9]= {0.06,{0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}}
In[10]:= coeff[100]//Timing
exCoeff[100]//Timing
serCoeff[100]//Timing
Out[10]= {56.28,40899}
Out[11]= { 0.84,40899}
Out[12]= { 0.06,40899}
So SeriesCoefficient is definitely the way to go. Unless of course you're
a bit better at combinatorics than me and you know the following prime partition formulae
(oeis)
In[13]:= CoefficientList[Series[1/Product[1-x^Prime[i],{i,1,30}],{x,0,30}],x]
Out[13]= {1,0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}
In[14]:= f[n_]:=Length#IntegerPartitions[n,All,Prime#Range#PrimePi#n]; Array[f,30]
Out[14]= {0,1,1,1,2,2,3,3,4,5,6,7,9,10,12,14,17,19,23,26,30,35,40,46,52,60,67,77,87,98}
Related
Using fixed point to show square root
In going through the exercises of SICP, it defines a fixed-point as a function that satisfies the equation F(x)=x. And iterating to find where the function stops changing, for example F(F(F(x))). The thing I don't understand is how a square root of, say, 9 has anything to do with that. For example, if I have F(x) = sqrt(9), obviously x=3. Yet, how does that relate to doing: F(F(F(x))) --> sqrt(sqrt(sqrt(9))) Which I believe just converges to zero: >>> math.sqrt(math.sqrt(math.sqrt(math.sqrt(math.sqrt(math.sqrt(9)))))) 1.0349277670798647 Since F(x) = sqrt(x) when x=1. In other words, how does finding the square root of a constant have anything to do with finding fixed points of functions?
When calculating the square-root of a number, say a, you essentially have an equation of the form x^2 - a = 0. That is, to find the square-root of a, you have to find an x such that x^2 = a or x^2 - a = 0 -- call the latter equation as (1). The form given in (1) is an equation which is of the form g(x) = 0, where g(x) := x^2 - a. To use the fixed-point method for calculating the roots of this equation, you have to make some subtle modifications to the existing equation and bring it to the form f(x) = x. One way to do this is to rewrite (1) as x = a/x -- call it (2). Now in (2), you have obtained the form required for solving an equation by the fixed-point method: f(x) is a/x. Observe that this method requires both sides of the equation to have an 'x' term; an equation of the form sqrt(a) = x doesn't meet the specification and hence can't be solved (iteratively) using the fixed-point method. The thing I don't understand is how a square root of, say, 9 has anything to do with that. For example, if I have F(x) = sqrt(9), obviously x=3. Yet, how does that relate to doing: F(F(F(x))) --> sqrt(sqrt(sqrt(9))) These are standard methods for numerical calculation of roots of non-linear equations, quite a complex topic on its own and one which is usually covered in Engineering courses. So don't worry if you don't get the "hang of it", the authors probably felt it was a good example of iterative problem solving.
You need to convert the problem f(x) = 0 to a fixed point problem g(x) = x that is likely to converge to the root of f(x). In general, the choice of g(x) is tricky. if f(x) = x² - a = 0, then you should choose g(x) as follows: g(x) = 1/2*(x + a/x) (This choice is based on Newton's method, which is a special case of fixed-point iterations). To find the square root, sqrt(a): guess an initial value of x0. Given a tolerance ε, compute xn+1 = 1/2*(xn + a/xn) for n = 0, 1, ... until convergence.
Relaxation of linear constraints?
When we need to optimize a function on the positive real half-line, and we only have non-constraints optimization routines, we use y = exp(x), or y = x^2 to map to the real line and still optimize on the log or the (signed) square root of the variable. Can we do something similar for linear constraints, of the form Ax = b where, for x a d-dimensional vector, A is a (N,n)-shaped matrix and b is a vector of length N, defining the constraints ?
While, as Ervin Kalvelaglan says this is not always a good idea, here is one way to do it. Suppose we take the SVD of A, getting A = U*S*V' where if A is n x m U is nxn orthogonal, S is nxm, zero off the main diagonal, V is mxm orthogonal Computing the SVD is not a trivial computation. We first zero out the elements of S which we think are non-zero just due to noise -- which can be a slightly delicate thing to do. Then we can find one solution x~ to A*x = b as x~ = V*pinv(S)*U'*b (where pinv(S) is the pseudo inverse of S, ie replace the non zero elements of the diagonal by their multiplicative inverses) Note that x~ is a least squares solution to the constraints, so we need to check that it is close enough to being a real solution, ie that Ax~ is close enough to b -- another somewhat delicate thing. If x~ doesn't satisfy the constraints closely enough you should give up: if the constraints have no solution neither does the optimisation. Any other solution to the constraints can be written x = x~ + sum c[i]*V[i] where the V[i] are the columns of V corresponding to entries of S that are (now) zero. Here the c[i] are arbitrary constants. So we can change variables to using the c[] in the optimisation, and the constraints will be automatically satisfied. However this change of variables could be somewhat irksome!
Overflows and spikes while doing gradient descent
I am trying to find a function h(r) that minimises a functional H(h) by a very simple gradient descent algorithm. The result of H(h) is a single number. (Basically, I have a field configuration in space and I am trying to minimise the energy due to this field). The field is discretized in space, and the gradient used is the derivative of H with respect to the field's value at each discrete point in space. I am doing this on Mathematica, but I think the code is easy to follow for non-users of Mathematica. The function Hamiltonian takes a vector of field values, the spacing of points d, and the number of points imax, and gives the value of energy. EderhSym is the function that gives a table of values for the derivatives at each point. I wrote the derivative function manually to save computation time. (The details of these two functions are probably irrelevant for the question). Hamiltonian[hvect_, d_, imax_] := Sum[(i^2/2)*d*(hvect[[i + 1]] - hvect[[i]])^2, {i, 1, imax - 1}] + Sum[i^2*d^3*Potential[hvect[[i]]], {i, 1, imax}] EderhSym[hvect_,d_,imax_]:=Join[{0},Table[2 i^2 d hvect[[i]]-i(i+1)d hvect[[i+1]] -i(i-1)d hvect[[i-1]]+i^2 d^3 Vderh[hvect[[i]]], {i, 2, imax - 1}], {0}] The code below shows a single iteration of gradient descent. hvect1 is some starting configuration that I have guessed using physical principles. Ederh = EderhSym[hvect1, d, imax]; hvect1 = hvect1 - StepSize*Ederh; The problem is that I am getting random spikes in the derivative table. The spikes keep growing until there is an overflow. I have tried changing the step size, I have tried using moving averages, low pass filters, Gaussian filters etc. I still get spikes that cause an overflow. Has anyone encountered this? Is it a problem with the way I am setting up gradient descent? Aside - I am testing my gradient descent code as I will have to adapt it to a different multivariable Hamiltonian where I do this to find a saddle point instead: (n is an appropriately chosen small number, 10 in my case) For[c = 1, c <= n, c++, Ederh = EderhSym[hvect1, \[CapitalDelta], imax]; hvect = hvect - \[CapitalDelta]\[Tau]*Ederh; ]; Ederh = EderhSym[hvect1, \[CapitalDelta], imax]; hvect1 = hvect1 + n \[CapitalDelta]\[Tau]*Ederh; Edit: It seems to be working with a much smaller step size than I had previously tried (although convergence is slow). I believe that was the problem, but I do not see why the divergences are localised at particular points.
How to maximize the log-likelihood for a Gaussian Process in Mathematica
I am currently trying to implement a Gaussian Process in Mathematica and am stuck with the maximization of the loglikelihood. I just tried to use the FindMaximum formula on my loglikelihood function but this does not seem to work on this function. gpdata = {{-1.5, -1.8}, {-1., -1.2}, {-0.75, -0.4}, {-0.4, 0.1}, {-0.25, 0.5}, {0., 0.8}}; kernelfunction[i_, j_, h0_, h1_] := h0*h0*Exp[-(gpdata[[i, 1]] - gpdata[[j, 1]])^2/(2*h1^2)] + KroneckerDelta[i, j]*0.09; covariancematrix[h0_, h1_] = ParallelTable[kernelfunction[i, j, h0, h1], {i, 1, 6}, {j, 1, 6}]; loglikelihood[h0_, h1_] := -0.5* gpdata[[All, 2]].LinearSolve[covariancematrix[h0, h1], gpdata[[All, 2]], Method -> "Cholesky"] - 0.5*Log[Det[covariancematrix[h0, h1]]] - 3*Log[2*Pi]; FindMaximum[loglikelihood[a, b], {{a, 1}, {b, 1.1}}, MaxIterations -> 500, Method -> "QuasiNewton"] In the loglikelihood I would usually have the product of the inverse of the covariance matrix times the gpdata[[All, 2]] vector but because the covariance matrix is always positive semidefinite I wrote it this way. Also the evaluation does not stop if I use gpdata[[All, 2]].Inverse[ covariancematrix[h0, h1]].gpdata[[All, 2]] Has anyone an idea? I am actually working on a far more complicated problem where I have 6 parameters to optimize but I already have problems with 2.
In my experience I've seen that second-order methods fail with hyper-parameter optimization more than gradient based methods. I think this is because (most?) second-order methods rely on the function being close to a quadratic near the current estimate. Using conjugate-gradient or even Powell's (derivative-free) conjugate direction method has proved successful in my experiments. For the two parameter case, I would suggest making a contour plot of the hyper-parameter surface for some intuition.
Maximizing in mathematica with multiple maxima
I'm trying to compute the maxima of some function of one variable (something like this:) (which is calculated from a non-trivial convolution, so, no, I don't have an expression for it) Using the command: NMaximize[{f[x], 0 < x < 1}, x, AccuracyGoal -> 4, PrecisionGoal -> 4] (I'm not that worried about super accuracy, a rough estimate of 10^-4 is already enough) The result of this is x* = 0.55, which is not what should be. (i.e., it is picking the third peak). Is there any way of telling mathematica that the global maxima is the first one when counting from x = 0 (I know this is always true), or make mathematica search with a better approach? (Notice, I don't want things like Stimulated Annealing approach; each evaluation is very costly!) Thanks very much!
Try FindMaximum with a starting point of 0 or some similarly small value.