Applying scipy.sparse.linalg.svds returns nan values - numpy

I am starting to use the scipy.sparse library, and when I try to apply scipy.sparse.linalg.svds, I get an error if there are zero singular values.
I am doing this because in the end I am going to use very large and very sparse matrices with entries only {+1, -1} which are not square (>1100*1000 size with >0.99 sparsity), and I want to know their rank.
I know approximately what the rank is, it is almost full, so knowing only the last singular values can tell me what is the rank exactly.
This is why I chose to work with scipy.sparse.linalg.svds and set which='LM'. If the rank is not full, there will be singular values which are zero, this is my code:
import numpy as np
import scipy.sparse as sp
import scipy.sparse.linalg as la
a = np.array([[0, 0, 0], [0, 0, 0], [1, 1, -1]], dtype='d')
sp_a = sp.csc_matrix(a)
s = la.svds(sp_a, k=2, return_singular_vectors=False, which='SM')
print(s)
output is
[ nan 9.45667059e-12]
/usr/lib/python3/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py:1849: RuntimeWarning: invalid value encountered in sqrt
s = np.sqrt(eigvals)
Any thoughts on why this happens?
Maybe there is another efficient way to know the rank, knowing that I have a large non-square very sparse matrix with almost full rank?
scipy version 1.1.0
numpy version 1.14.5
Linux platform
Thanks in advance

Related

Get 2D elements from 3D numpy array, corresponding to 2D indices

I have a 3D array that could be interpreted as a 2D matrix of positions, where each position is a 2D array of coordinates [x,y].
I then have a list of 2D indices, each one indicating a position in the matrix in terms of [row, column].
I would like to obtain the positions from the matrix corresponding to all these indices.
What I am doing is:
import numpy as np
input_matrix = np.array(
[[[0.0, 1.5], [3.0, 3.0]], [[7.0, 5.2], [6.0, 7.0]]]
)
indices = np.array([[1, 0], [1, 1]])
selected_elements = np.array([input_matrix[tuple(idx)] for idx in indices])
So for example the 2D element corresponding to the 2D index [1, 0] would be [7.0, 5.2] and so on.
My code works, but I was wondering if there is a better and way, for example using numpy entirely (e.g. without having to use list comprehension in the case of multiple 2D indices).
I tried to use the numpy take but it does not seem to produce the wanted results.
You can use:
input_matrix[tuple(indices.T)]
Or, as suggested in comments:
input_matrix[indices[:,0], indices[:,1]]
Output:
array([[7. , 5.2],
[6. , 7. ]])

type hint npt.NDArray number of axis

Given I have the number of axes, can I specify the number of axes to the type hint npt.NDArray (from import numpy.typing as npt)
i.e. if I know it is a 3D array, how can I do npt.NDArray[3, np.float64]
On Python 3.9 and 3.10 the following does the job for me:
data = [[1, 2, 3], [4, 5, 6]]
arr: np.ndarray[Tuple[Literal[2], Literal[3]], np.dtype[np.int_]] = np.array(data)
It is a bit cumbersome, but you might follow numpy issue #16544 for future development on easier specification.
In particular, for now you must declare the full shape and can't only declare the rank of the array.
In the future something like ndarray[Shape[:, :, :], dtype] should be available.

Strange behaviour of numpy eigenvector: bug or no bug

NumPy's eigenvector solution differs from Wolfram Alpha and my personal calculation by hand.
>>> import numpy.linalg
>>> import numpy as np
>>> numpy.linalg.eig(np.array([[-2, 1], [2, -1]]))
(array([-3., 0.]), array([[-0.70710678, -0.4472136 ],
[ 0.70710678, -0.89442719]]))
Wolfram Alpha https://www.wolframalpha.com/input/?i=eigenvectors+%7B%7B-2,1%7D,%7B%2B2,-1%7D%7D and my personal calculation give the eigenvectors (-1, 1) and (2, 1). The NumPy solution however differs.
NumPy's calculated eigenvalues however are confirmed by Wolfram Alpha and my personal calculation.
So, is this a bug in NumPy or is my understanding of math to simple? A similar thread Numpy seems to produce incorrect eigenvectors sees the main difference in rounding/scaling of the eigenvectors but the deviation between the solutions would be massive.
Regards
numpy.linalg.eig normalizes the eigen vectors with the results being the column vectors
eig_vectors = np.linalg.eig(np.array([[-2, 1], [2, -1]]))[1]
vec_1 = eig_vectors[:,0]
vec_2 = eig_vectors[:,1]
now these 2 vectors are just normalized versions of the vectors you calculated ie
print(vec_1 * np.sqrt(2)) # where root 2 is the magnitude of [-1, 1]
print(vec_1 * np.sqrt(5)) # where root 5 is the magnitude of [2, 1]
So bottom line the both sets of calculations are equivalent just Numpy likes to normalze the results.

python,numpy matrix must be 2-dimensional

Why the line3 raise valueError‘ matrix must be 2-dimensional’
import numpy as np
np.mat([[[1],[2]],[[10],[1,3]]])
np.mat([[[1],[2]],[[10],[1]]])
The reason why this code raises an error is because NumPy tries to determine the dimensionality of your input using nesting levels (nesting levels -> dimensions).
If, at some level, some elements do not have the same length (i.e. they are incompatible), it will create the array using the deepest nesting it can, using the objects as the elements of the array.
For this reason:
np.mat([[[1],[2]],[[10],[1,3]]])
Will give you a matrix of objects (lists), while:
np.mat([[[1],[2]],[[10],[1]]])
would result in a 3D array of numbers which np.mat() does not want to squeeze into a matrix.
Also, please avoid using np.mat() in your code as it is deprecated.
Use np.array() instead.
Incidentally, np.array() would work in both cases and it would give you a (2, 2, 1)-shaped array of int, which you could np.squeeze() into a matrix if you like.
However, it would be better to start from nesting level of 2 if all you want is a matrix:
np.array([[1, 2], [10, 1]])

Why does MinMaxScaler add lines to image?

I want to normalize the pixel values of an image to the range [0, 1] for each channel (R, G, B).
Minimal Example
#!/usr/bin/env python
import numpy as np
import scipy
from sklearn import preprocessing
original = scipy.misc.imread('Crocodylus-johnsoni-3.jpg')
scipy.misc.imshow(original)
transformed = np.zeros(original.shape, dtype=np.float64)
scaler = preprocessing.MinMaxScaler()
for channel in range(3):
transformed[:, :, channel] = scaler.fit_transform(original[:, :, channel])
scipy.misc.imsave("transformed.jpg", transformed)
What happens
Taking https://commons.wikimedia.org/wiki/File:Crocodylus-johnsoni-3.jpg,
I get the following "normalized" result:
As you can see there are lines from top to bottom at the right side. What happened there? It seems to me that the normalization went wrong. If so: How do I fix it?
In scikit-learn, a two-dimensional array with shape (m, n) is usually interpreted as a collection of m samples, with each sample having n features.
MinMaxScaler.fit_transform() transforms each feature, so each column of your array is transformed independently of the others. That results in the vertical "stripes" in the image.
It looks like you intended to scale each color channel independently. To do that using MinMaxScaler, reshape the input so that each channel becomes one column. That is, if the original image has shape (m, n, 3), reshape it to (m*n, 3) before passing it to the fit_transform() method, and then restore the shape of the result to create the transformed array.
For example,
ascolumns = original.reshape(-1, 3)
t = scaler.fit_transform(ascolumns)
transformed = t.reshape(original.shape)
With this, transformed looks like this:
The image looks exactly like the original, because it turns out that in the array original, the minimum and maximum are 0 and 255, respectively, in each channel:
In [41]: original.min(axis=(0, 1))
Out[41]: array([0, 0, 0], dtype=uint8)
In [42]: original.max(axis=(0, 1))
Out[42]: array([255, 255, 255], dtype=uint8)
So all fit_transform does in this case is transform all the input values to the floating point range [0.0, 1.0] uniformly. If the minimum or maximum was different in one of the channels, the transformed image would look different.
By the way, it is not difficult to perform the transform using pure numpy. (I'm using Python 3, so in the following, the division automatically casts the result to floating point. If you are using Python 2, you'll need to convert one of the argument to floating point, or use from __future__ import division.)
In [58]: omin = original.min(axis=(0, 1), keepdims=True)
In [59]: omax = original.max(axis=(0, 1), keepdims=True)
In [60]: xformed = (original - omin)/(omax - omin)
In [61]: np.allclose(xformed, transformed)
Out[61]: True
(One potential problem with that method is that it will generate an error if one of the channels is constant, because then one of the values in omax - omin will be 0.)