Is it possible to compute the sign of a permutation in linear time? - numpy

I was just wondering if there's a way to compute the sign of a permutation within linear (or at least better than n^2?) time
For example, let's say I have an array of n numbers and I permute two elements within this array which would flip the sign of the permutation. I have a function that can compute this in n^2 time, however, it seems there might be a more efficient algorithm.
I've attached a minimal reproducible example of computing in quadratic time,
import numpy as np
vals = np.arange(1,6,1)
pvals = np.arange(1,6,1)
pvals[0], pvals[1] = pvals[1], pvals[0] #swap
def quadratic(vals):
sgn_matrix = np.sign(np.expand_dims(vals, -1) - np.expand_dims(vals, -2))
return np.prod(np.tril(np.ones_like(sgn_matrix)) + np.triu(sgn_matrix, 1))
def sub_quadratic(vals):
#algorithm quicker than quadratic time?
sgn = quadratic(vals)
print(sgn) #prints +1
psgn = quadratic(pvals)
print(psgn) #prints -1 (because one permutation)
I have had a look around SO (here for example) and people keep talking about cyclic permutations which apparently can compute in linear time but it's something I'm unaware of completely and can't find much of myself.
TL;DR Does anyone know of a method for computing the sign of a permutation in sub-quadratic time ?

Just decompose it into transpositions and check whether you needed an even or odd number of transpositions:
def permutation_sign(perm):
parity = 1
perm = perm.copy()
for i in range(len(perm)):
while perm[i] != i+1:
parity *= -1
j = perm[i] - 1
# Note: if you try to inline the j computation into the next line,
# you'll get evaluation order bugs.
perm[i], perm[j] = perm[j], perm[i]
return parity

Related

How to calculate a very large correlation matrix

I have an np.array of observations z where z.shape is (100000, 60). I want to efficiently calculate the 100000x100000 correlation matrix and then write to disk the coordinates and values of just those elements > 0.95 (this is a very small fraction of the total).
My brute-force version of this looks like the following but is, not surprisingly, very slow:
for i1 in range(z.shape[0]):
for i2 in range(i1+1):
r = np.corrcoef(z[i1,:],z[i2,:])[0,1]
if r > 0.95:
file.write("%6d %6d %.3f\n" % (i1,i2,r))
I realize that the correlation matrix itself could be calculated much more efficiently in one operation using np.corrcoef(z), but the memory requirement is then huge. I'm also aware that one could break up the data set into blocks and calculate bite-size subportions of the correlation matrix at one time, but programming that and keeping track of the indices seems unnecessarily complicated.
Is there another way (e.g., using memmap or pytables) that is both simple to code and doesn't put excessive demands on physical memory?
After experimenting with the memmap solution proposed by others, I found that while it was faster than my original approach (which took about 4 days on my Macbook), it still took a very long time (at least a day) -- presumably due to inefficient element-by-element writes to the outputfile. That wasn't acceptable given my need to run the calculation numerous times.
In the end, the best solution (for me) was to sign in to Amazon Web Services EC2 portal, create a virtual machine instance (starting with an Anaconda Python-equipped image) with 120+ GiB of RAM, upload the input data file, and do the calculation (using the matrix multiplication method) entirely in core memory. It completed in about two minutes!
For reference, the code I used was basically this:
import numpy as np
import pickle
import h5py
# read nparray, dimensions (102000, 60)
infile = open(r'file.dat', 'rb')
x = pickle.load(infile)
infile.close()
# z-normalize the data -- first compute means and standard deviations
xave = np.average(x,axis=1)
xstd = np.std(x,axis=1)
# transpose for the sake of broadcasting (doesn't seem to work otherwise!)
ztrans = x.T - xave
ztrans /= xstd
# transpose back
z = ztrans.T
# compute correlation matrix - shape = (102000, 102000)
arr = np.matmul(z, z.T)
arr /= z.shape[0]
# output to HDF5 file
with h5py.File('correlation_matrix.h5', 'w') as hf:
hf.create_dataset("correlation", data=arr)
From my rough calculations, you want a correlation matrix that has 100,000^2 elements. That takes up around 40 GB of memory, assuming floats.
That probably won't fit in computer memory, otherwise you could just use corrcoef.
There's a fancy approach based on eigenvectors that I can't find right now, and that gets into the (necessarily) complicated category...
Instead, rely on the fact that for zero mean data the covariance can be found using a dot product.
z0 = z - mean(z, 1)[:, None]
cov = dot(z0, z0.T)
cov /= z.shape[-1]
And this can be turned into the correlation by normalizing by the variances
sigma = std(z, 1)
corr = cov
corr /= sigma
corr /= sigma[:, None]
Of course memory usage is still an issue.
You can work around this with memory mapped arrays (make sure it's opened for reading and writing) and the out parameter of dot (For another example see Optimizing my large data code with little RAM)
N = z.shape[0]
arr = np.memmap('corr_memmap.dat', dtype='float32', mode='w+', shape=(N,N))
dot(z0, z0.T, out=arr)
arr /= sigma
arr /= sigma[:, None]
Then you can loop through the resulting array and find the indices with a large correlation coefficient. (You may be able to find them directly with where(arr > 0.95), but the comparison will create a very large boolean array which may or may not fit in memory).
You can use scipy.spatial.distance.pdist with metric = correlation to get all the correlations without the symmetric terms. Unfortunately this will still leave you with about 5e10 terms that will probably overflow your memory.
You could try reformulating a KDTree (which can theoretically handle cosine distance, and therefore correlation distance) to filter for higher correlations, but with 60 dimensions it's unlikely that would give you much speedup. The curse of dimensionality sucks.
You best bet is probably brute forcing blocks of data using scipy.spatial.distance.cdist(..., metric = correlation), and then keep only the high correlations in each block. Once you know how big a block your memory can handle without slowing down due to your computer's memory architecture it should be much faster than doing one at a time.
please check out deepgraph package.
https://deepgraph.readthedocs.io/en/latest/tutorials/pairwise_correlations.html
I tried on z.shape = (2500, 60) and pearsonr for 2500 * 2500. It has an extreme fast speed.
Not sure for 100000 x 100000 but worth trying.

numpy.cov or numpy.linalg.eigvals gives wrong results

I have high (100) dimensional data. I want to get the eigenvectors of the covariance matrix of the data.
Cov = numpy.cov(data)
EVs = numpy.linalg.eigvals(Cov)
I get a vector containing some eigenvalues which are complex numbers. This is mathematically impossible. Granted, the imaginary parts of the complex numbers are very small but it still causes issues later on. Is this a numerical issue? If so, does the issue lie with cov, eigvals function or both?
To give more color on that, I did the same calculation in Mathematica which gives, of course, a correct result. Turns out there are some eigenvalues which are very close to zero but not quiet zero and numpy gets all of these wrong (magnitude wise and it makes some of them into complex numbers)
I was facing a similar issue: np.linalg.eigvals was returning a complex vector in which the imaginary part was quasi-zero everywhere.
Using np.linalg.eigvalsh instead fixed it for me.
I don't know the exact reason, but most probably it is a numerical issue and eigvalsh seems to handle it whereas eigvals doesn't. Note that the ordering of the actual eigenvalues may differ.
The following snippet illustrates the fix:
import numpy as np
from numpy.linalg import eigvalsh, eigvals
D = 10
MUL = 100
EPS = 1e-8
x = np.random.rand(1, D) * MUL
x -= x.mean()
S = np.matmul(x.T, x) + I
# adding epsilon*I avoids negative eigenvalues due to numerical error
# since the matrix is actually positive semidef. (useful for cholesky etc)
S += np.eye(D, dtype=np.float64) * EPS
print(sorted(eigvalsh(S)))
print(sorted(eigvals(S)))

Find global maximum in the lest number of computations

Let's say I have a function f defined on interval [0,1], which is smooth and increases up to some point a after which it starts decreasing. I have a grid x[i] on this interval, e.g. with a constant step size of dx = 0.01, and I would like to find which of those points has the highest value, by doing the smallest number of evaluations of f in the worst-case scenario. I think I can do much better than exhaustive search by applying something inspired with gradient-like methods. Any ideas? I was thinking of something like a binary search perhaps, or parabolic methods.
This is a bisection-like method I coded:
def optimize(f, a, b, fa, fb, dx):
if b - a <= dx:
return a if fa > fb else b
else:
m1 = 0.5*(a + b)
m1 = _round(m1, a, dx)
fm1 = fa if m1 == a else f(m1)
m2 = m1 + dx
fm2 = fb if m2 == b else f(m2)
if fm2 >= fm1:
return optimize(f, m2, b, fm2, fb, dx)
else:
return optimize(f, a, m1, fa, fm1, dx)
def _round(x, a, dx, right = False):
return a + dx*(floor((x - a)/dx) + right)
The idea is: find the middle of the interval and compute m1 and m2- the points to the right and to the left of it. If the direction there is increasing, go for the right interval and do the same, otherwise go for the left. Whenever the interval is too small, just compare the numbers on the ends. However, this algorithm still does not use the strength of the derivatives at points I computed.
Such a function is called unimodal.
Without computing the derivatives, you can work by
finding where the deltas x[i+1]-x[i] change sign, by dichotomy (the deltas are positive then negative after the maximum); this takes Log2(n) comparisons; this approach is very close to what you describe;
adapting the Golden section method to the discrete case; it takes Logφ(n) comparisons (φ~1.618).
Apparently, the Golden section is more costly, as φ<2, but actually the dichotomic search takes two function evaluations at a time, hence 2Log2(n)=Log√2(n) .
One can show that this is optimal, i.e. you can't go faster than O(Log(n)) for an arbitrary unimodal function.
If your function is very regular, the deltas will vary smoothly. You can think of the interpolation search, which tries to better predict the searched position by a linear interpolation rather than simple halving. In favorable conditions, it can reach O(Log(Log(n)) performance. I don't know of an adaptation of this principle to the Golden search.
Actually, linear interpolation on the deltas is very close to parabolic interpolation on the function values. The latter approach might be the best for you, but you need to be careful about the corner cases.
If derivatives are allowed, you can use any root solving method on the first derivative, knowing that there is an isolated zero in the given interval.
If only the first derivative is available, use regula falsi. If the second derivative is possible as well, you may consider Newton, but prefer a safe bracketing method.
I guess that the benefits of these approaches (superlinear and quadratic convergence) are made a little useless by the fact that you are working on a grid.
DISCLAIMER: Haven't test the code. Take this as an "inspiration".
Let's say you have the following 11 points
x,f(x) = (0,3),(1,7),(2,9),(3,11),(4,13),(5,14),(6,16),(7,5),(8,3)(9,1)(1,-1)
you can do something like inspired to the bisection method
a = 0 ,f(a) = 3 | b=10,f(b)=-1 | c=(0+10/2) f(5)=14
from here you can see that the increasing interval is [a,c[ and there is no need to that for the maximum because we know that in that interval the function is increasing. Maximum has to be in interval [c,b]. So at the next iteration you change the value of a s.t. a=c
a = 5 ,f(a) = 14 | b=10,f(b)=-1 | c=(5+10/2) f(6)=16
Again [a,c] is increasing so a is moved on the right
you can iterate the process until a=b=c.
Here the code that implements this idea. More info here:
int main(){
#define STEP (0.01)
#define SIZE (1/STEP)
double vals[(int)SIZE];
for (int i = 0; i < SIZE; ++i) {
double x = i*STEP;
vals[i] = -(x*x*x*x - (0.6)*(x*x));
}
for (int i = 0; i < SIZE; ++i) {
printf("%f ",vals[i]);
}
printf("\n");
int a=0,b=SIZE-1,c;
double fa=vals[a],fb=vals[b] ,fc;
c=(a+b)/2;
fc = vals[c];
while( a!=b && b!=c && a!=c){
printf("%i %i %i - %f %f %f\n",a,c,b, vals[a], vals[c],vals[b]);
if(fc - vals[c-1] > 0){ //is the function increasing in [a,c]
a = c;
}else{
b=c;
}
c=(a+b)/2;
fa=vals[a];
fb=vals[b];
fc = vals[c];
}
printf("The maximum is %i=%f with %f\n", c,(c*STEP),vals[a]);
}
Find points where derivative(of f(x))=(df/dx)=0
for derivative you could use five-point-stencil or similar algorithms.
should be O(n)
Then fit those multiple points (where d=0) on a polynomial regression / least squares regression .
should be also O(N). Assuming all numbers are neighbours.
Then find top of that curve
shouldn't be more than O(M) where M is resolution of trials for fit-function.
While taking derivative, you could leap by k-length steps until derivate changes sign.
When derivative changes sign, take square root of k and continue reverse direction.
When again, derivative changes sign, take square root of new k again, change direction.
Example: leap by 100 elements, find sign change, leap=10 and reverse direction, next change ==> leap=3 ... then it could be fixed to 1 element per step to find exact location.
I am assuming that the function evaluation is very costly.
In the special case, that your function could be approximately fitted with a polynomial, you can easily calculate the extrema in least number of function evaluations. And since you know that there is only one maximum, a polynomial of degree 2 (quadratic) might be ideal.
For example: If f(x) can be represented by a polynomial of some known degree, say 2, then, you can evaluate your function at any 3 points and calculate the polynomial coefficients using Newton's difference or Lagrange interpolation method.
Then its simple to solve for the maximum for this polynomial. For a degree 2 you can easily get a closed form expression for the maximum.
To get the final answer you can then search in the vicinity of the solution.

cumulative simpson integration with scipy

I have some code which uses scipy.integration.cumtrapz to compute the antiderivative of a sampled signal. I would like to use Simpson's rule instead of Trapezoid. However scipy.integration.simps seems not to have a cumulative counterpart... Am I missing something? Is there a simple way to get a cumulative integration with "scipy.integration.simps"?
You can always write your own:
def cumsimp(func,a,b,num):
#Integrate func from a to b using num intervals.
num*=2
a=float(a)
b=float(b)
h=(b-a)/num
output=4*func(a+h*np.arange(1,num,2))
tmp=func(a+h*np.arange(2,num-1,2))
output[1:]+=tmp
output[:-1]+=tmp
output[0]+=func(a)
output[-1]+=func(b)
return np.cumsum(output*h/3)
def integ1(x):
return x
def integ2(x):
return x**2
def integ0(x):
return np.ones(np.asarray(x).shape)*5
First look at the sum and derivative of a constant function.
print cumsimp(integ0,0,10,5)
[ 10. 20. 30. 40. 50.]
print np.diff(cumsimp(integ0,0,10,5))
[ 10. 10. 10. 10.]
Now check for a few trivial examples:
print cumsimp(integ1,0,10,5)
[ 2. 8. 18. 32. 50.]
print cumsimp(integ2,0,10,5)
[ 2.66666667 21.33333333 72. 170.66666667 333.33333333]
Writing your integrand explicitly is much easier here then reproducing the simpson's rule function of scipy in this context. Picking intervals will be difficult to do when provided a single array, do you either:
Use every other value for the edges of simpson's rule and the remaining values as centers?
Use the array as edges and interpolate values of centers?
There are also a few options for how you want the intervals summed. These complications could be why its not coded in scipy.
Your question has been answered a long time ago, but I came across the same problem recently. I wrote some functions to compute such cumulative integrals for equally spaced points; the code can be found on GitHub. The order of the interpolating polynomials ranges from 1 (trapezoidal rule) to 7. As Daniel pointed out in the previous answer, some choices have to be made on how the intervals are summed, especially at the borders; results may thus be sightly different depending on the package you use. Be also aware that the numerical integration may suffer from Runge's phenomenon (unexpected oscillations) for high orders of polynomials.
Here is an example:
import numpy as np
from scipy import integrate as sp_integrate
from gradiompy import integrate as gp_integrate
# Definition of the function (polynomial of degree 7)
x = np.linspace(-3,3,num=15)
dx = x[1]-x[0]
y = 8*x + 3*x**2 + x**3 - 2*x**5 + x**6 - 1/5*x**7
y_int = 4*x**2 + x**3 + 1/4*x**4 - 1/3*x**6 + 1/7*x**7 - 1/40*x**8
# Cumulative integral using scipy
y_int_trapz = y_int [0] + sp_integrate.cumulative_trapezoid(y,dx=dx,initial=0)
print('Integration error using scipy.integrate:')
print(' trapezoid = %9.5f' % np.linalg.norm(y_int_trapz-y_int))
# Cumulative integral using gradiompy
y_int_trapz = gp_integrate.cumulative_trapezoid(y,dx=dx,initial=y_int[0])
y_int_simps = gp_integrate.cumulative_simpson(y,dx=dx,initial=y_int[0])
print('\nIntegration error using gradiompy.integrate:')
print(' trapezoid = %9.5f' % np.linalg.norm(y_int_trapz-y_int))
print(' simpson = %9.5f' % np.linalg.norm(y_int_simps-y_int))
# Higher order cumulative integrals
for order in range(5,8,2):
y_int_composite = gp_integrate.cumulative_composite(y,dx,order=order,initial=y_int[0])
print(' order %i = %9.5f' % (order,np.linalg.norm(y_int_composite-y_int)))
# Display the values of the cumulative integral
print('\nCumulative integral (with initial offset):\n',y_int_composite)
You should get the following result:
'''
Integration error using scipy.integrate:
trapezoid = 176.10502
Integration error using gradiompy.integrate:
trapezoid = 176.10502
simpson = 2.52551
order 5 = 0.48758
order 7 = 0.00000
Cumulative integral (with initial offset):
[-6.90203571e+02 -2.29979407e+02 -5.92267425e+01 -7.66415188e+00
2.64794452e+00 2.25594840e+00 6.61937372e-01 1.14797061e-13
8.20130517e-01 3.61254267e+00 8.55804341e+00 1.48428883e+01
1.97293221e+01 1.64257877e+01 -1.13464286e+01]
'''
I would go with Daniel's solution. But you need to be careful if the function that you are integrating is itself subject to fluctuations. Simpson's requires the function to be well-behaved (meaning in this case, one that is continuous).
There are techniques for making a moderately badly behaved function look like it is better behaved than it really is (really forms of approximation of your function) but in that case you have to be sure that the function "adequately" approximates yours. In that case you might make the intervals may be non-uniform to handle the problem.
An example might be in considering the flow of a field that, over longer time scales, is approximated by a well-behaved function but which over shorter periods is subject to limited random fluctuations in its density.

np.fft.fft off by a factor of 1000 (fitting an powerspectrum)

I'm trying to make a powerspectrum from an experimental dataset which I am reading in, and then to fit it to an theoretical curve. Now everything is working fine and I'm not getting errors, except for the fact that my curve keeps differing by a factor of 1000 from the data and I have absolutely no idea what the problem could be. I've asked a few people, but to no avail. (I hope that you guys will be able to help)
Anyways, I'm pretty sure that its not the units, as they were tripple checked by me and 2 others. Basically, I need to fit a powerspectrum to an equation by using the least squares method.
I can't post the whole code, as its rather long and a bit messy, but this is the fourier part, I added comments to all arrays and vars which have not been declared in the code)
#Calculate stuff
Nm = 10**-6 #micro to meter
KbT = 4.10E-21 #Joule
T = 297. #K
l = zvalue*Nm #meter
meany = np.mean(cleandatay*Nm) #meter (cleandata is the array that I read in from a cvs at the start.)
SDy = sum((cleandatay*Nm - meany)**2)/len(cleandatay) #meter^2
FmArray[0][i] = ((KbT*l)/SDy) #N
#print FmArray[0][i]
print float((i*100/len(filelist)))#how many % done?
#fourier
dt = cleant[1]-cleant[0] #timestep
N = len(cleandatay) #Same for cleant, its the corresponding time to cleandatay
Here is where the fourier part starts, I take the fft and turn it into a powerspectrum. Then I calculate the corresponding freq steps with the array freqs
fouriery = np.fft.fft((cleandatay*(10**-6)))
fourierpower = (np.abs(fouriery))**2
fourierpower = fourierpower[1:N/2] #remove 0th datapoint and /2 (remove negative freqs)
fourierpower = fourierpower*dt #*dt to account for steps
freqs = (1.+np.arange((N/2)-1.))/50.
#Least squares method
eta = 8.9E-4 #pa*s
Rbead = 0.5E-6#meter
constant = 2*KbT/(3*eta*pi*Rbead)
omega = 2*pi*freqs #rad/s
Wcarray = 2.*pi*np.arange(0,30, 0.02003) #0.02 = 30/len(freqs)
ChiSq = np.zeros(len(Wcarray))
for k in range(0, len(Wcarray)):
Py = (constant / (Wcarray[k]**2 + omega**2))
ChiSq[k] = sum((fourierpower - Py)**2)
pylab.loglog(omega, Py)
print k*100/len(Wcarray)
index = np.where(ChiSq == min(ChiSq))
cutoffw = Wcarray[index]
Pygoed = (constant / (Wcarray[index]**2 + omega**2))
print cutoffw
print constant
print min(ChiSq)
pylab.loglog(omega,ChiSq)
So I have no idea what could be going wrong, I think its the fft, as nothing else can really go wrong.
Below is the pic I get when I plot all the fit lines against the spectrum, as you can see it is off by about 1000 (actually exactly 1000, as this leaves a least square residue of 10^-22, but I can't just randomly multiply without knowing why)
Just to elaborate on the picture. The green dots are the fft spectrum, the lines are the fits, the red dot is where it thinks the cutoff frequency is, and the blue line is the chi-squared fit, looking for the lowest value.
Take a look at the documentation for the FFT that you are using. Many FFTs introduce a scaling factor that is usually N * result (number of samples). Multiplying by 1/N will scale the results back in line. (You said that the result is 1000 too high....could it be that you are using a 1024 size FFT?)
Your library FFT routine might include a scale factor of 1/sqrt(n).
Check the documentation for the fft you used, as the proportion of the scale factor allocated between the fft and the ifft is arbitrary.