Transform a numpy 3D ndarray to a symmetric form with respect to a specific index - numpy

In the case of a matrix mat n x n, i can do the following
sym = 0.5 * (mat + mat.T)
the operation gives the desired result sym[i,j] = sym[j,i]
Suppose we have a 3D array ndarr[i,j,k], where i,j,k 0,1,...n,
then ndarr is n x n x n. The idea is to obtain the following "symmetric" form
nsym[i,j,k] = nsym[j,i,k] using ndarr. I tried this:
import numpy as np
# Generate some random matrix, n = 5
ndarr = np.random.beta(0.1,1,(5,5,5))
# First attempt to symmetrize
sym1 = np.array([0.5*(ndarr[:,:,k]+ndarr[:,:,k].T) for k in range(5)])
The problem here is that sym1[i,j,k] != sym1[j,i,k] as it is required. In fact I obtain sym1[i,j,k] = sym1[i,k,j], symmetric under the exchange of the last two symbols!
# Second attempt
sym2 = 0.5*(ndarr+ndarr.T)
Same problem here and sym2 is symmetric with respect the second index sym2[i,j,k]=sym2[k,j,i].
To resume, the goal is to find a symmetric form for a 3D array with respect to the third index and to preserve the values in the diagonal for the original ndarr[i,i,i].

The problem here is that you're not using the correct transpose:
sym = 0.5 * (ndarr + np.transpose(ndarr, (1, 0, 2)))
By default, np.transpose and the .T property will reverse the order of the axes. In your case, we want to only flip the first two axes: (0,1,2) -> (1,0,2).
EDIT: The reason your first attempt failed is because you were concatenating each symmetrized matrix along the first axis, not the last. It's more clear if you make ndarr with shape (5, 5, 3):
In [16]: sym = np.array([0.5*(ndarr[:,:,k]+ndarr[:,:,k].T) for k in range(3)])
In [17]: sym.shape
Out[17]: (3L, 5L, 5L)
In any case, the version above with np.transpose is cleaner and more efficient.

Related

Is nx.eigenvector_centrality_numpy() using the Arnoldi iteration instead of the basic power method?

Since nx.eigenvector_centrality_numpy() using ARPACK, is it mean that nx.eigenvector_centrality_numpy() using Arnoldi iteration instead of the basic power method?
because when I try to compute manually using the basic power method, the result of my computation is different from the result of nx.eigenvector_centrality_numpy(). Can someone explain it to me?
To make it more clear, here is my code and the result that I got from the function and the result when I compute manually.
import networkx as nx
G = nx.DiGraph()
G.add_edge('a', 'b', weight=4)
G.add_edge('b', 'a', weight=2)
G.add_edge('b', 'c', weight=2)
G.add_edge('b','d', weight=2)
G.add_edge('c','b', weight=2)
G.add_edge('d','b', weight=2)
centrality = nx.eigenvector_centrality_numpy(G, weight='weight')
centrality
The result:
{'a': 0.37796447300922725,
'b': 0.7559289460184545,
'c': 0.3779644730092272,
'd': 0.3779644730092272}
Below is code from Power Method Python Program and I did a little bit of modification:
# Power Method to Find Largest Eigen Value and Eigen Vector
# Importing NumPy Library
import numpy as np
import sys
# Reading order of matrix
n = int(input('Enter order of matrix: '))
# Making numpy array of n x n size and initializing
# to zero for storing matrix
a = np.zeros((n,n))
# Reading matrix
print('Enter Matrix Coefficients:')
for i in range(n):
for j in range(n):
a[i][j] = float(input( 'a['+str(i)+']['+ str(j)+']='))
# Making numpy array n x 1 size and initializing to zero
# for storing initial guess vector
x = np.zeros((n))
# Reading initial guess vector
print('Enter initial guess vector: ')
for i in range(n):
x[i] = float(input( 'x['+str(i)+']='))
# Reading tolerable error
tolerable_error = float(input('Enter tolerable error: '))
# Reading maximum number of steps
max_iteration = int(input('Enter maximum number of steps: '))
# Power Method Implementation
lambda_old = 1.0
condition = True
step = 1
while condition:
# Multiplying a and x
ax = np.matmul(a,x)
# Finding new Eigen value and Eigen vector
x = ax/np.linalg.norm(ax)
lambda_new = np.vdot(ax,x)
# Displaying Eigen value and Eigen Vector
print('\nSTEP %d' %(step))
print('----------')
print('Eigen Value = %0.5f' %(lambda_new))
print('Eigen Vector: ')
for i in range(n):
print('%0.5f\t' % (x[i]))
# Checking maximum iteration
step = step + 1
if step > max_iteration:
print('Not convergent in given maximum iteration!')
break
# Calculating error
error = abs(lambda_new - lambda_old)
print('errror='+ str(error))
lambda_old = lambda_new
condition = error > tolerable_error
I used the same matrix and the result:
STEP 99
----------
Eigen Value = 3.70328
Eigen Vector:
0.51640
0.77460
0.25820
0.25820
errror=0.6172133998483682
STEP 100
----------
Eigen Value = 4.32049
Eigen Vector:
0.71714
0.47809
0.35857
0.35857
Not convergent in given maximum iteration!
I've to try to compute it with my calculator too and I know it's not convergent because |lambda1|=|lambda2|=4. I've to know the theory behind nx.eigenvector_centrality_numpy() properly so I can write it right for my thesis. Help me, please

How can I reconstruct original matrix from SVD components with following shapes?

I am trying to reconstruct the following matrix of shape (256 x 256 x 2) with SVD components as
U.shape = (256, 256, 256)
s.shape = (256, 2)
vh.shape = (256, 2, 2)
I have already tried methods from documentation of numpy and scipy to reconstruct the original matrix but failed multiple times, I think it maybe 3D matrix has a different way of reconstruction.
I am using numpy.linalg.svd for decompostion.
From np.linalg.svd's documentation:
"... If a has more than two dimensions, then broadcasting rules apply, as explained in :ref:routines.linalg-broadcasting. This means that SVD is
working in "stacked" mode: it iterates over all indices of the first
a.ndim - 2 dimensions and for each combination SVD is applied to the
last two indices."
This means that you only need to handle the s matrix (or tensor in general case) to obtain the right tensor. More precisely, what you need to do is pad s appropriately and then take only the first 2 columns (or generally, the number of rows of vh which should be equal to the number of columns of the returned s).
Here is a working code with example for your case:
import numpy as np
mat = np.random.randn(256, 256, 2) # Your matrix of dim 256 x 256 x2
u, s, vh = np.linalg.svd(mat) # Get the decomposition
# Pad the singular values' arrays, obtain diagonal matrix and take only first 2 columns:
s_rep = np.apply_along_axis(lambda _s: np.diag(np.pad(_s, (0, u.shape[1]-_s.shape[0])))[:, :_s.shape[0]], 1, s)
mat_reconstructed = u # s_rep # vh
mat_reconstructed equals to mat up to precision error.

Efficient way to calculate the pairwise matrix product between one tensor and all the rolling of another tensor

Suppose we have two tensors:
tensor A whose shape is (d,m,n)
tensor B whose shape is (d,n,l).
If we want to get the pairwise matrix product of the right-most matrix of A and B, I think we can use np.einsum('dmn,...nl->d...ml',A,B) whose size is (d,d,m,l). However, I would like to get the pairwise product of not all the pairs.
Import a parameter k, 1<=k<=d, I want to get the following pairwise matrix product:
from
A(0,...)#B(0,...)
to
A(0,...)#B(k-1,...)
;
from
A(1,...)#B(1,...)
to
A(1,...)#B(k,...)
;
....
;
from
A(d-2,...)#B(d-2,...),
A(d-2,...)#B(d-1,...)
to
A(d-2,...)#B(k-3,...)
;
from
A(d-1,...)#B(d-1,...)
to
A(d-1,...)#B(k-2,...)
.
Note here we we use a rolling way to deal with tensor B. (like numpy.roll).
Finally, we actually get a tensor whose shape is (d,k,m,l).
What's the most efficient way to do this.
I know several ways like:
First get np.einsum('dmn,...nl->d...ml',A,B), then use a mask to extract the (d,k) pairs.
tile B first, then use einsum in some way.
But I think there exists a better way.
I doubt you can do much better than a for loop. Here is, for example, a vectorized version using einsum and stride_tricks compared to a double for loop:
Code:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from numpy.lib.stride_tricks import as_strided
B = BenchmarkBuilder()
#B.add_function()
def loopy(A,B,k):
d,m,n = A.shape
l = B.shape[-1]
out = np.empty((d,k,m,l),int)
for i in range(d):
for j in range(k):
out[i,j] = A[i]#B[(i+j)%d]
return out
#B.add_function()
def vectory(A,B,k):
d,m,n = A.shape
l = B.shape[-1]
BB = np.concatenate([B,B[:k-1]],0)
BB = as_strided(BB,(d,k,n,l),np.repeat(BB.strides,(2,1,1)))
return np.einsum("ikl,ijln->ijkn",A,BB)
#B.add_arguments('d x k x m x n x l')
def argument_provider():
for exp in range(10):
d,k,m,n,l = (np.r_[1.6,1.5,1.5,1.5,1.5]**exp*(4,2,2,2,2)).astype(int)
print(d,k,m,n,l)
A = np.random.randint(0,10,(d,m,n))
B = np.random.randint(0,10,(d,n,l))
yield k*d*m*n*l,MultiArgument([A,B,k])
r = B.run()
r.plot()
import pylab
pylab.savefig('diagwa.png')

Difficulty with numpy broadcasting

I have two 2d point clouds (oldPts and newPts) which I whish to combine. They are mx2 and nx2 numpyinteger arrays with m and n of order 2000. newPts contains many duplicates or near duplicates of oldPts and I need to remove these before combining.
So far I have used the histogram2d function to produce a 2d representation of oldPts (H). I then compare each newPt to an NxN area of H and if it is empty I accept the point. This last part I am currently doing with a python loop which i would like to remove. Can anybody show me how to do this with broadcasting or perhaps suggest a completely different method of going about the problem. the working code is below
npzfile = np.load(path+datasetNo+'\\temp.npz')
arrs = npzfile.files
oldPts = npzfile[arrs[0]]
newPts = npzfile[arrs[1]]
# remove all the negative values
oldPts = oldPts[oldPts.min(axis=1)>=0,:]
newPts = newPts[newPts.min(axis=1)>=0,:]
# round to integers
oldPts = np.around(oldPts).astype(int)
newPts = newPts.astype(int)
# put the oldPts into 2d array
H, xedg,yedg= np.histogram2d(oldPts[:,0],oldPts[:,1],
bins = [xMax,yMax],
range = [[0, xMax], [0, yMax]])
finalNewList = []
N = 5
for pt in newPts:
if not H[max(0,pt[0]-N):min(xMax,pt[0]+N),
max(0,pt[1]- N):min(yMax,pt[1]+N)].any():
finalNewList.append(pt)
finalNew = np.array(finalNewList)
The right way to do this is to use linear algebra to compute the distance between each pair of 2-long vectors, and then accept only the new points that are "different enough" from each old point: using scipy.spatial.distance.cdist:
import numpy as np
oldPts = np.random.randn(1000,2)
newPts = np.random.randn(2000,2)
from scipy.spatial.distance import cdist
dist = cdist(oldPts, newPts)
print(dist.shape) # (1000, 2000)
okIndex = np.max(dist, axis=0) > 5
print(np.sum(okIndex)) # prints 1503 for me
finalNew = newPts[okIndex,:]
print(finalNew.shape) # (1503, 2)
Above I use the Euclidean distance of 5 as the threshold for "too close": any point in newPts that's farther than 5 from all points in oldPts is accepted into finalPts. You will have to look at the range of values in dist to find a good threshold, but your histogram can guide you in picking the best one.
(One good way to visualize dist is to use matplotlib.pyplot.imshow(dist).)
This is a more refined version of what you were doing with the histogram. In fact, you ought to be able to get the exact same answer as the histogram by passing in metric='minkowski', p=1 keyword arguments to cdist, assuming your histogram bin widths are the same in both dimensions, and using 5 again as the threshold.
(PS. If you're interested in another useful function in scipy.spatial.distance, check out my answer that uses pdist to find unique rows/columns in an array.)

Numpy - Find spatial position of a gridpoint in 3-d matrix (knowing the index of that gridpoint)

So I think I might be absolutely on the wrong track here, but basically
I have a 3-d meshgrid, I find all of the distances to a testpoint at all of the points in that grid
import numpy as np
#crystal_lattice structure
x,y,z = np.linspace(-2,2,5),np.linspace(-2,2,5),np.linspace(-2,2,5)
xx,yy,zz = np.meshgrid(x,y,z)
#testpoint
point = np.array([1,1,1])
d = np.sqrt((point[0]-xx)**2 + (point[1]-yy)**2 + (point[2]-zz)**2)
#np.shape(d) = (5, 5, 5)
Then I am trying to find the coordinates of he gridpoint that is the closest to that test point.
My idea was to sort d (flatten then search), get the index of the lowest value.
low_to_hi_d = np.sort(d, axis=None) # axis=0 flattens the d, going to flatten the entire d array and then search
lowest_val = low_to_hi_d[0]
index = np.where(d == lowest_val)
#how do I get the spatial coordinates of my index, not just the position in ndarray (here the position in ndarray is (3,3,3) but the spatial position is (1,1,1), but if I do d[3,3,3] I get 0 (the value at spatial position (1,1,1))
Use that index on my 3d grid to find the point coordinates (not the d value at that point). I am trying something like this, and I am pretty sure I am overcomplicating it. How can I get the (x,y,z) of the 3-d gridpoint that is closest to my test point?
If you just want to find the coordinates of the closest point you are right, you're on the wrong track. There is no point in generating a meshgrid and calculate the distance on so many duplicates. You can do it in every dimension easily and independently:
import numpy as np
x,y,z = np.linspace(-2,2,5),np.linspace(-2,2,5),np.linspace(-2,2,5)
p=np.array([1,1,1])
closest=lambda x,p: x[np.argmin(np.abs(x-p))]
xc,yc,zc=closest(x,p[0]),closest(y,p[1]),closest(z,p[2])
I'm not completely sure that this is what you want.
You can find the index of the minimum d with:
idx = np.unravel_index(np.argmin(d), d.shape)
(3, 3, 3)
and use this to index your meshgrid:
xx[idx], yy[idx], zz[idx]
(1.0, 1.0, 1.0)