type hint npt.NDArray number of axis - numpy

Given I have the number of axes, can I specify the number of axes to the type hint npt.NDArray (from import numpy.typing as npt)
i.e. if I know it is a 3D array, how can I do npt.NDArray[3, np.float64]

On Python 3.9 and 3.10 the following does the job for me:
data = [[1, 2, 3], [4, 5, 6]]
arr: np.ndarray[Tuple[Literal[2], Literal[3]], np.dtype[np.int_]] = np.array(data)
It is a bit cumbersome, but you might follow numpy issue #16544 for future development on easier specification.
In particular, for now you must declare the full shape and can't only declare the rank of the array.
In the future something like ndarray[Shape[:, :, :], dtype] should be available.

Related

Strange behaviour of numpy eigenvector: bug or no bug

NumPy's eigenvector solution differs from Wolfram Alpha and my personal calculation by hand.
>>> import numpy.linalg
>>> import numpy as np
>>> numpy.linalg.eig(np.array([[-2, 1], [2, -1]]))
(array([-3., 0.]), array([[-0.70710678, -0.4472136 ],
[ 0.70710678, -0.89442719]]))
Wolfram Alpha https://www.wolframalpha.com/input/?i=eigenvectors+%7B%7B-2,1%7D,%7B%2B2,-1%7D%7D and my personal calculation give the eigenvectors (-1, 1) and (2, 1). The NumPy solution however differs.
NumPy's calculated eigenvalues however are confirmed by Wolfram Alpha and my personal calculation.
So, is this a bug in NumPy or is my understanding of math to simple? A similar thread Numpy seems to produce incorrect eigenvectors sees the main difference in rounding/scaling of the eigenvectors but the deviation between the solutions would be massive.
Regards
numpy.linalg.eig normalizes the eigen vectors with the results being the column vectors
eig_vectors = np.linalg.eig(np.array([[-2, 1], [2, -1]]))[1]
vec_1 = eig_vectors[:,0]
vec_2 = eig_vectors[:,1]
now these 2 vectors are just normalized versions of the vectors you calculated ie
print(vec_1 * np.sqrt(2)) # where root 2 is the magnitude of [-1, 1]
print(vec_1 * np.sqrt(5)) # where root 5 is the magnitude of [2, 1]
So bottom line the both sets of calculations are equivalent just Numpy likes to normalze the results.

Whats the difference between `arr[tuple(seq)]` and `arr[seq]`? Relating to Using a non-tuple sequence for multidimensional indexing is deprecated

I am using an ndarray to slice another ndarray.
Normally I use arr[ind_arr]. numpy seems to not like this and raises a FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated use arr[tuple(seq)] instead of arr[seq].
What's the difference between arr[tuple(seq)] and arr[seq]?
Other questions on StackOverflow seem to be running into this error in scipy and pandas and most people suggest the error to be in the particular version of these packages. I am running into the warning running purely in numpy.
Example posts:
FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated use `arr[tuple(seq)]` instead of `arr[seq]`
FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated use `arr[tuple(seq)]`
FutureWarning with distplot in seaborn
MWE reproducing warning:
import numpy as np
# generate a random 2d array
A = np.random.randint(20, size=(7,7))
print(A, '\n')
# define indices
ind_i = np.array([1, 2, 3]) # along i
ind_j = np.array([5, 6]) # along j
# generate index array using meshgrid
ind_ij = np.meshgrid(ind_i, ind_j, indexing='ij')
B = A[ind_ij]
print(B, '\n')
C = A[tuple(ind_ij)]
print(C, '\n')
# note: both produce the same result
meshgrid returns a list of arrays:
In [50]: np.meshgrid([1,2,3],[4,5],indexing='ij')
Out[50]:
[array([[1, 1],
[2, 2],
[3, 3]]), array([[4, 5],
[4, 5],
[4, 5]])]
In [51]: np.meshgrid([1,2,3],[4,5],indexing='ij',sparse=True)
Out[51]:
[array([[1],
[2],
[3]]), array([[4, 5]])]
ix_ does the same thing, but returns a tuple:
In [52]: np.ix_([1,2,3],[4,5])
Out[52]:
(array([[1],
[2],
[3]]), array([[4, 5]]))
np.ogrid also produces the list.
In [55]: arr = np.arange(24).reshape(4,6)
indexing with the ix tuple:
In [56]: arr[_52]
Out[56]:
array([[10, 11],
[16, 17],
[22, 23]])
indexing with the meshgrid list:
In [57]: arr[_51]
/usr/local/bin/ipython3:1: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
#!/usr/bin/python3
Out[57]:
array([[10, 11],
[16, 17],
[22, 23]])
Often the meshgrid result is used with unpacking:
In [62]: I,J = np.meshgrid([1,2,3],[4,5],indexing='ij',sparse=True)
In [63]: arr[I,J]
Out[63]:
array([[10, 11],
[16, 17],
[22, 23]])
Here [I,J] is the same as [(I,J)], making a tuple of the 2 subarrays.
Basically they are trying to remove a loophole that existed for historical reasons. I don't know if they can change the handling of meshgrid results without causing further compatibility issues.
For Future Readers having this FutureWarning: ... and want to correctly resolve them.
Read the answer of #hpaulj here. And notice the point is when he(she) said:
[...] returns a list of [...]
[...] returns a tuple of [...]
If you don't know why 1. is the point please read the answer: https://stackoverflow.com/a/71487259/5290519, which is also written by him(her). This answer provides:
Why, in the first place, the usage of list as index will cause a warning.
[4] is a problem because in the past, certain lists were interpreted as though they were tuples. This is a legacy case that developers are trying to cleanup, hence the FutureWarning.
A series of concise examples on the (correct) usage of list,tuple,np.array as array index.
If you still don't get the point, try my own answer after I figuring all these complicated concepts: https://stackoverflow.com/a/71493474/5290519

Applying scipy.sparse.linalg.svds returns nan values

I am starting to use the scipy.sparse library, and when I try to apply scipy.sparse.linalg.svds, I get an error if there are zero singular values.
I am doing this because in the end I am going to use very large and very sparse matrices with entries only {+1, -1} which are not square (>1100*1000 size with >0.99 sparsity), and I want to know their rank.
I know approximately what the rank is, it is almost full, so knowing only the last singular values can tell me what is the rank exactly.
This is why I chose to work with scipy.sparse.linalg.svds and set which='LM'. If the rank is not full, there will be singular values which are zero, this is my code:
import numpy as np
import scipy.sparse as sp
import scipy.sparse.linalg as la
a = np.array([[0, 0, 0], [0, 0, 0], [1, 1, -1]], dtype='d')
sp_a = sp.csc_matrix(a)
s = la.svds(sp_a, k=2, return_singular_vectors=False, which='SM')
print(s)
output is
[ nan 9.45667059e-12]
/usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py:1849: RuntimeWarning: invalid value encountered in sqrt
s = np.sqrt(eigvals)
Any thoughts on why this happens?
Maybe there is another efficient way to know the rank, knowing that I have a large non-square very sparse matrix with almost full rank?
scipy version 1.1.0
numpy version 1.14.5
Linux platform
Thanks in advance

What does tf.gather_nd intuitively do?

Can you intuitively explain or give more examples about tf.gather_nd for indexing and slicing into high-dimensional tensors in Tensorflow?
I read the API, but it is kept quite concise that I find myself hard to follow the function's concept.
Ok, so think about it like this:
You are providing a list of index values to index the provided tensor to get those slices. The first dimension of the indices you provide is for each index you will perform. Let's pretend that tensor is just a list of lists.
[[0]] means you want to get one specific slice(list) at index 0 in the provided tensor. Just like this:
[tensor[0]]
[[0], [1]] means you want get two specific slices at indices 0 and 1 like this:
[tensor[0], tensor[1]]
Now what if tensor is more than one dimensions? We do the same thing:
[[0, 0]] means you want to get one slice at index [0,0] of the 0-th list. Like this:
[tensor[0][0]]
[[0, 1], [2, 3]] means you want return two slices at the indices and dimensions provided. Like this:
[tensor[0][1], tensor[2][3]]
I hope that makes sense. I tried using Python indexing to help explain how it would look in Python to do this to a list of lists.
You provide a tensor and indices representing locations in that tensor. It returns the elements of the tensor corresponding to the indices you provide.
EDIT: An example
import tensorflow as tf
sess = tf.Session()
x = [[1,2,3],[4,5,6]]
y = tf.gather_nd(x, [[1,1],[1,2]])
print(sess.run(y))
[5, 6]

repmat with interlace or Kronecker product in Tensorflow

Suppose I have a tensor:
A=[[1,2,3],[4,5,6]]
Which is a matrix with 2 rows and 3 columns.
I would like to replicate it, suppose twice, to get the following tensor:
A2 = [[1,2,3],
[1,2,3],
[4,5,6],
[4,5,6]]
Using tf.repmat will clearly replicate it differently, so I tried the following code (which works):
A_tiled = tf.reshape(tf.tile(A, [1, 2]), [4, 3])
Unfortunately, it seems to be working very slow when the number of columns become large. Executing it in Matlab using Kronecker product with a vector of ones (Matlab's "kron") seems to be much faster.
Can anyone help?