Related
I have a numpy array
How can I find which of them are the same and how many times appear in the matrix?
thanks
dummy example:
A=np.array([[0, 1, 0, 1],[0, 0, 0, 0],[0, 1, 1, 1],[0, 0, 0, 0]])
You can use numpy.unique with axis=0 and return_counts=True:
np.unique(A, axis=0, return_counts=True)
Output:
(array([[0, 0, 0, 0],
[0, 1, 0, 1],
[0, 1, 1, 1]]),
array([2, 1, 1]))
Let's say i have the following array:
import numpy as np
a = np.array([[1, 2, 3], [0, 1, 2], [1, 3, 4], [4, 5, 6]])
a = sp_sparse.csr_matrix(a)
and I want to get a submatrix of the sparse array that consists of the first and last rows.
>>>sub_matrix = a[[0, 3], :]
>>>print(sub_matrix)
(0, 0) 1
(0, 1) 2
(0, 2) 3
(1, 0) 4
(1, 1) 5
(1, 2) 6
But I want to keep the original indexing for the selected rows, so for my example, it would be something like:
(0, 0) 1
(0, 1) 2
(0, 2) 3
(3, 0) 4
(3, 1) 5
(3, 2) 6
I know I could do this by setting all the other rows of the dense array to zero and then computing the sparse array again but I want to know if there is a better way to achieve this.
Any help would be appreciated!
Depending on the indexing, it might be easier to construct the extractor/indexing matrix with the coo style of inputs:
In [129]: from scipy import sparse
In [130]: M = sparse.csr_matrix(np.arange(16).reshape(4,4))
In [131]: M
Out[131]:
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 15 stored elements in Compressed Sparse Row format>
In [132]: M.A
Out[132]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
A square extractor matrix with the desired "diagonal" values:
In [133]: extractor = sparse.csr_matrix(([1,1],([0,3],[0,3])))
In [134]: extractor
Out[134]:
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>
Matrix multiplication in one direction selects columns:
In [135]: M#extractor
Out[135]:
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [136]: _.A
Out[136]:
array([[ 0, 0, 0, 3],
[ 4, 0, 0, 7],
[ 8, 0, 0, 11],
[12, 0, 0, 15]])
and in the other, rows:
In [137]: extractor#M
Out[137]:
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [138]: _.A
Out[138]:
array([[ 0, 1, 2, 3],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[12, 13, 14, 15]])
In [139]: extractor.A
Out[139]:
array([[1, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 1]])
M[[0,3],:] does the same thing, but with:
In [140]: extractor = sparse.csr_matrix(([1,1],([0,1],[0,3])))
In [142]: (extractor#M).A
Out[142]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])
Row and column sums are also performed with matrix multiplication:
In [149]: M#np.ones(4,int)
Out[149]: array([ 6, 22, 38, 54])
import numpy as np
import scipy.sparse as sp_sparse
a = np.array([[1, 2, 3], [0, 1, 2], [1, 3, 4], [4, 5, 6]])
a = sp_sparse.csr_matrix(a)
It's probably easiest to just use a selection matrix and then multiply.
idx = np.isin(np.arange(a.shape[0]), [0, 3]).astype(int)
b = sp_sparse.diags(idx, format='csr') # a
The disadvantage is that this will result in an array of floats instead of integers, but that's easy enough to fix.
>>> b.astype(int).A
array([[1, 2, 3],
[0, 0, 0],
[0, 0, 0],
[4, 5, 6]])
I am trying to find a vectorized way (or at least, better than using a loop) to create a three-dimensional NumPy array from a list of 2D NumPy arrays. Right now, I have a list L that looks something like:
L = [ np.array([[1,2,3], [4,5,6]]), np.array([[8,9,10]]), ...]
Each NumPy array has the same size for the second dimension (in the above case, it is 3). But the first dimension has different sizes.
My goal is to create a 3D NumPy array M that incorporates the above data. I've been trying to use the np.pad() function, since I have a maximum size for the first dimension of each of my arrays, but it looks like it would only operate on the individual elements of my list. I could then do what I wanted using the function and looping over every array. However, I'd like to do this without a loop if possible, using a vectorized approach. Are there any techniques to do this?
This question is related to this one, though I'm hoping to do this over my whole list at once.
First lets look at the common task of padding 1d arrays to a common size.
In [441]: alist = [np.ones((2,),int),np.zeros((1,),int)+2, np.zeros((3,),int)+3]
In [442]: alist
Out[442]: [array([1, 1]), array([2]), array([3, 3, 3])]
The obvious iterative approach:
In [443]: [np.hstack((arr, np.zeros((3-arr.shape[0]),int))) for arr in alist]
Out[443]: [array([1, 1, 0]), array([2, 0, 0]), array([3, 3, 3])]
In [444]: np.stack(_)
Out[444]:
array([[1, 1, 0],
[2, 0, 0],
[3, 3, 3]])
A clever alternative. It still requires an iteration to determine sizes, but the rest is whole-array "vectorization":
In [445]: sizes = [arr.shape[0] for arr in alist]
In [446]: sizes
Out[446]: [2, 1, 3]
Make the output array with the pad values:
In [448]: res = np.zeros((3,3),int)
Make a clever mask (#Divakar first proposed this)
In [449]: np.array(sizes)[:,None]>np.arange(3)
Out[449]:
array([[ True, True, False],
[ True, False, False],
[ True, True, True]])
then map the 'flattened' inputs to res:
In [450]: res[_]=np.hstack(alist)
In [451]: res
Out[451]:
array([[1, 1, 0],
[2, 0, 0],
[3, 3, 3]])
I think this process can be extended to your 2d=>3d case. But it will take a bit of work. I tried doing it directly and found I was getting lost in applying the mask. That's why I decided to first layout the 1d=>2d case. There's enough thinking-outside-the-box that I have to work out the details fresh each time.
2d=>3d
In [457]: a2list = [np.ones((2,3),int),np.zeros((1,3),int)+2, np.zeros((3,3),int)+3]
In [458]: [np.vstack((arr, np.zeros((3-arr.shape[0],arr.shape[1]),int))) for arr in a2list]
Out[458]:
[array([[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]),
array([[2, 2, 2],
[0, 0, 0],
[0, 0, 0]]),
array([[3, 3, 3],
[3, 3, 3],
[3, 3, 3]])]
In [459]: np.stack(_)
Out[459]:
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[0, 0, 0],
[0, 0, 0]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
Now for the 'vectorized' approach:
In [460]: sizes = [arr.shape[0] for arr in a2list]
In [461]: sizes
Out[461]: [2, 1, 3]
In [462]: np.array(sizes)[:,None]>np.arange(3)
Out[462]:
array([[ True, True, False],
[ True, False, False],
[ True, True, True]])
In [463]: res = np.zeros((3,3,3),int)
and the corresponding indices from the mask:
In [464]: I,J=np.nonzero(Out[462])
In [465]: I
Out[465]: array([0, 0, 1, 2, 2, 2])
In [466]: J
Out[466]: array([0, 1, 0, 0, 1, 2])
In [467]: res[I,J,:] = np.vstack(a2list)
In [468]: res
Out[468]:
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[0, 0, 0],
[0, 0, 0]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
In [57]: dW = np.zeros((2,2), int)
In [58]: x = [[0,1,1,0,1, 0], [1, 1, 1, 1, 0, 0]]
In [59]: np.add.at(dW,x,1)
/home/infinity/anaconda3/bin/ipython:1: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
I was trying to create a confusion matrix using numpy, but I got the previous error. How can I fix that?
If we supply an array rather than a list, we don't get the future warning:
In [11]: In [57]: dW = np.zeros((2,2), int)
...:
...: In [58]: x = [[0,1,1,0,1, 0], [1, 1, 1, 1, 0, 0]]
In [12]: dW
Out[12]:
array([[0, 0],
[0, 0]])
In [13]: x
Out[13]: [[0, 1, 1, 0, 1, 0], [1, 1, 1, 1, 0, 0]]
In [14]: np.add.at(dW,np.array(x),1)
In [15]: dW
Out[15]:
array([[5, 5],
[7, 7]])
According to the docs,
indices : array_like or tuple
Array like index object or slice object for indexing into first
operand. If first operand has multiple dimensions, indices can be a
tuple of array like index objects or slice objects.
In [17]: np.add.at(dW,tuple(x),1)
In [18]: dW
Out[18]:
array([[6, 7],
[8, 9]])
In [19]: tuple(x)
Out[19]: ([0, 1, 1, 0, 1, 0], [1, 1, 1, 1, 0, 0])
Recent numpy versions have been tightening the indexing rules. In the past lists were sometimes allowed in contexts that really expected tuples or arrays. This futurewarning is part of that tightening.
===
As commented:
In [22]: In [57]: dW = np.zeros((2,2), int)
...:
In [23]: np.add.at(dW,tuple(x),1)
In [24]: dW
Out[24]:
array([[1, 2],
[1, 2]])
When I updated to the most recent version of numpy, a lot of my code broke because now every time I call np.dot() on a matrix and an array, it returns a 1xn matrix rather than simply an array.
This causes me an error when I try to multiply the new vector/array by a matrix
example
A = np.matrix( [ [4, 1, 0, 0], [1, 5, 1, 0], [0, 1, 6, 1], [1, 0, 1, 4] ] )
x = np.array([0, 0, 0, 0])
print(x)
x1 = np.dot(A, x)
print(x1)
x2 = np.dot(A, x1)
print(x2)
output:
[0 0 0 0]
[[0 0 0 0]]
Traceback (most recent call last):
File "review.py", line 13, in <module>
x2 = np.dot(A, x1)
ValueError: shapes (4,4) and (1,4) not aligned: 4 (dim 1) != 1 (dim 0)
I would expect that either dot of a matrix and vector would return a vector, or dot of a matrix and 1xn matrix would work as expected.
Using the transpose of x doesn't fix this, nor does using A # x, or A.dot(x) or any variation of np.matmul(A, x)
Your arrays:
In [24]: A = np.matrix( [ [4, 1, 0, 0], [1, 5, 1, 0], [0, 1, 6, 1], [1, 0, 1, 4] ] )
...: x = np.array([0, 0, 0, 0])
In [25]: A.shape
Out[25]: (4, 4)
In [26]: x.shape
Out[26]: (4,)
The dot:
In [27]: np.dot(A,x)
Out[27]: matrix([[0, 0, 0, 0]]) # (1,4) shape
Let's try the same, but with a ndarray version of A:
In [30]: A.A
Out[30]:
array([[4, 1, 0, 0],
[1, 5, 1, 0],
[0, 1, 6, 1],
[1, 0, 1, 4]])
In [31]: np.dot(A.A, x)
Out[31]: array([0, 0, 0, 0])
The result is (4,) shape. That makes sense: (4,4) dot (4,) => (4,)
np.dot(A,x) is doing the same calculation, but returning a np.matrix. That by definition is a 2d array, so the (4,) is expanded to (1,4).
I don't have an older version to test this on, and am not aware of any changes.
If x is a (4,1) matrix, then the result (4,4)dot(4,1)=>(4,1):
In [33]: np.matrix(x)
Out[33]: matrix([[0, 0, 0, 0]])
In [34]: np.dot(A, np.matrix(x).T)
Out[34]:
matrix([[0],
[0],
[0],
[0]])