How do you concatenate several 2D arrays in numpy? - numpy

I would like
np.concatenate((np.array([[5,5],[2,3]]),np.array([[6,4],[7,8]])))
to yield
[ [[5,5],[2,3]], [[6,4],[7,8]] ]
Concatenate doesn't do the trick, but I am lost on how else to do it!

you can use numpy.stack() or numpy.append() (I suggest append if you have a large code). just pay attention it is the append of numpy. not built-in append of python.
>>> import numpy as np
>>> a = np.array([[5,5],[2,3]])
>>> b = np.array([[6,4],[7,8]])
>>> np.append([a], [b], axis = 0)
# answer:
array([[[5, 5],
[2, 3]],
[[6, 4],
[7, 8]]])
now if we go with np.stack():
>>> d = np.stack((a,b))
>>> c == d
# answer:
array([[[ True, True],
[ True, True]],
[[ True, True],
[ True, True]]])
as you can see they are the same.
you can see the user guide of numpy.append here and user guide of numpy.vstack here.

for anyone wondering np.stack((a,b)) does the trick :)

Related

Numpy broadcasting inequality operator

Numpy broadcasting question. I have two arrays similar to these:
>my_array = np.array([[3,1,2,0] , [4,5,2,1]])
>my_array
array([[3, 1, 2, 0],
[4, 5, 2, 1]])
>second_array = np.array([2,5])
>second_array
array([2, 5])
What I want to do is transpose second_array and test, by column, to see if my_array is >= second_array . So the result would be like this:
>final_array = np.array([ [ (3 >= 2), (1>= 2), (2>=2), (0>=2)] , [(4 >=5),(5>=5),(2>=5),(1>=5)]])
>final_array
array([[ True, False, True, False],
[False, True, False, False]], dtype=bool)
I'm pretty new to matrix operations in Numpy (been doing them in R for a long time) so thanks for helping with such an introductory question.
You just need to reshape second_array so that it has appropriate dimensions:
my_array >= second_array.reshape(2,1) # or (-1,1) if height is unknown
Or equivalently:
my_array >= second_array[:,np.newaxis]

Row wise contain in numpy [duplicate]

This question already has an answer here:
Numpy element-wise in operation
(1 answer)
Closed 5 years ago.
I was wondering what is the best way to check a row wise contain in python numpy?
Suppose we have a vector V = [1, 2, 3, 4] and a Matrix M = [[2, 3, 4], [3, 5, 6], [4, 1, 3], [5, 4, 2]] (The number of rows in M is equal to the length of V). After performing row wise contain, I should get (False, False, True, True) because 1 is not in [2, 3, 4] and 2 is not in [3, 5, 6] and 3 is in [4, 1, 3] and 4 is in [5, 4, 2]
What would be the best way to do this operation in python numpy?
Actually, I do not want to use a for loop. That obviously could work while is not the best way to do it. I myself came up with this idea to do a subtraction and then count the number of zeros in the result which is much faster than using for loops. However, I wanted to know if there is a better way to do it.
What you're looking for is the in operator. e.g. 1 in [1,2,3] returns True
So given your values of v and m as numpy arrays as follows:
import numpy as np
v = np.array([1,2,3,4])
m = np.array([np.array([2,3,4]), np.array([3,5,6]), np.array([4,1,3]), np.array([5,4,2])])
# Checking row wise contain
result = [(v[i] in m[i]) for i in range(len(v))]
The result is:
>>> [(v[i] in m[i]) for i in range(len(v))]
[False, False, True, True]
Another solution as Divakar pointed out would be to use
>>> (m==v[:,None]).any(1)
array([False, False, True, True], dtype=bool)
However, doing some rough timing checks:
>>> start_time=time.time(); (m==v[:,None]).any(1); print(time.time()-start_time)
array([False, False, True, True], dtype=bool)
0.000586032867432
>>> start_time=time.time(); [(v[i] in m[i]) for i in range(len(v))]; print(time.time()-start_time)
[False, False, True, True]
7.20024108887e-05
The initial solution seems to be faster.

Perform matrix multiplication between two arrays and get result only on masked places

I have two dense matrices, A [200000,10], B [10,100000]. I need to multiply them to get matrix C. I can't do that directly, since the resulting matrix won't fit into the memory. Moreover, I need only a few elements from the resulting matrix, like 1-2% of the total number of elements. I have a third matrix W [200000,100000] which is sparse and has non-zero elements on exactly those places which are interesting to me in the matrix C.
Is there a way to use W as a "mask" so that the resulting matrix C will be sparse and will contain only the needed elements?
Since a matrix multiplication is just a table of dot products, we can just perform the specific dot products we need, in a vectorized fashion.
import numpy as np
import scipy as sp
iX, iY = sp.nonzero(W)
values = np.sum(A[iX]*B[:, iY].T, axis=-1) #batched dot product
C = sp.sparse.coo_matrix(values, np.asarray([iX,iY]).T)
First, get the indexes of the non zero places in W, and then you can just get the (i,j) element of the result matrix by multiplying the i-th row in A with the j-th column in B, and save the result as a tuple (i,j,res) instead of saving it as a matrix (this is the right way to save sparse matrices).
Here's one approach using np.einsum for a vectorized solution -
from scipy import sparse
from scipy.sparse import coo_matrix
# Get row, col for the output array
r,c,_= sparse.find(W)
# Get the sum-reduction using valid rows and corresponding cols from A, B
out = np.einsum('ij,ji->i',A[r],B[:,c])
# Store as sparse matrix
out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
Sample run -
1) Inputs :
In [168]: A
Out[168]:
array([[4, 6, 1, 1, 1],
[0, 8, 1, 3, 7],
[2, 8, 3, 2, 2],
[3, 4, 1, 6, 3]])
In [169]: B
Out[169]:
array([[5, 2, 4],
[2, 1, 3],
[7, 7, 2],
[5, 7, 5],
[8, 5, 0]])
In [176]: W
Out[176]:
<4x3 sparse matrix of type '<type 'numpy.bool_'>'
with 5 stored elements in Compressed Sparse Row format>
In [177]: W.toarray()
Out[177]:
array([[ True, False, False],
[False, False, False],
[ True, True, False],
[ True, False, True]], dtype=bool)
2) Using dense array to perform direct calculations and verify results later on :
In [171]: (A.dot(B))*W.toarray()
Out[171]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])
3) Use the proposed codes and get sparse matrix output :
In [172]: # Using proposed codes
...: r,c,_= sparse.find(W)
...: out = np.einsum('ij,ji->i',A[r],B[:,c])
...: out_sparse = coo_matrix((out, (r, c)), shape=W.shape)
...:
4) Finally verify results by converting to dense/array version and checking against direct version -
In [173]: out_sparse.toarray()
Out[173]:
array([[52, 0, 0],
[ 0, 0, 0],
[73, 57, 0],
[84, 0, 56]])

Numpy creating logical array in the presence of NaNs

I have an array x, from which I would like to extract a logical mask. x contains nan values, and the mask operation raises a warning, which is what I am trying to avoid.
Here is my code:
import numpy as np
x = np.array([[0, 1], [2.0, np.nan]])
mask = np.isfinite(x) & (x > 0)
The resulting mask is correct (array([[False, True], [ True, False]], dtype=bool)), but a warning is raised:
__main__:1: RuntimeWarning: invalid value encountered in greater
How can I construct the mask in a way that avoids comparing against NaNs? I am not trying to suppress the warning (which I know how to do).
We could do it in two steps - Create the mask of finite ones and then use the same mask to index into itself and also to select the valid mask of remaining finite elements off x for testing and setting into the remaining elements in that mask. So, we would have an implementation like so -
In [35]: x
Out[35]:
array([[ 0., 1.],
[ 2., nan]])
In [36]: mask = np.isfinite(x)
In [37]: mask[mask] = x[mask]>0
In [38]: mask
Out[38]:
array([[False, True],
[ True, False]], dtype=bool)
Looks like masked arrays works with this case:
In [214]: x = np.array([[0, 1], [2.0, np.nan]])
In [215]: xm = np.ma.masked_invalid(x)
In [216]: xm
Out[216]:
masked_array(data =
[[0.0 1.0]
[2.0 --]],
mask =
[[False False]
[False True]],
fill_value = 1e+20)
In [217]: xm>0
Out[217]:
masked_array(data =
[[False True]
[True --]],
mask =
[[False False]
[False True]],
fill_value = 1e+20)
In [218]: _.data
Out[218]:
array([[False, True],
[ True, False]], dtype=bool)
But other than propagating the masking I don't know how it handles element by element operations like this. The usual fill and compressed steps don't seem relevant.

Is there a simple pad in numpy?

Is there a numpy function that pads an array this way?
import numpy as np
def pad(x, length):
tmp = np.zeros((length,))
tmp[:x.shape[0]] = x
return tmp
x = np.array([1,2,3])
print pad(x, 5)
Output:
[ 1. 2. 3. 0. 0.]
I couldn't find a way to do it with numpy.pad()
You can use ndarray.resize():
>>> x = np.array([1,2,3])
>>> x.resize(5)
>>> x
array([1, 2, 3, 0, 0])
Note that this functions behaves differently from numpy.resize(), which pads with repeated copies of the array itself. (Consistency is for people who can't remember everything.)
Sven Marnach's suggestion to use ndarray.resize() is probably the simplest way to do it, but for completeness, here's how it can be done with numpy.pad:
In [13]: x
Out[13]: array([1, 2, 3])
In [14]: np.pad(x, [0, 5-x.size], mode='constant')
Out[14]: array([1, 2, 3, 0, 0])