I have a 2D matrix A and a vector B. I want to find all row indices of elements in A that are also contained in B.
A = np.array([[1,9,5], [8,4,9], [4,9,3], [6,7,5]], dtype=int)
B = np.array([2, 4, 8, 10, 12, 18], dtype=int)
My current solution is only to compare A to one element of B at a time but that is horribly slow:
res = np.array([], dtype=int)
for i in range(B.shape[0]):
cres, _ = (B[i] == A).nonzero()
degElem = np.append(res, cres)
res = np.unique(res)
The following Matlab statement would solve my issue:
find(any(reshape(any(reshape(A, prod(size(A)), 1) == B, 2),size(A, 1),size(A, 2)), 2))
However comparing a row and a colum vector in Numpy does not create a Boolean intersection matrix as it does in Matlab.
Is there a proper way to do this in Numpy?
We can use np.isin masking.
To get all the row numbers, it would be -
np.where(np.isin(A,B).T)[1]
If you need them split based on each element's occurence -
[np.flatnonzero(i) for i in np.isin(A,B).T if i.any()]
Posted MATLAB code seems to be doing broadcasting. So, an equivalent one would be -
np.where(B[:,None,None]==A)[1]
Related
I trying to modify the following code snippet to not use reshape
a = np.random.randn(1, 2, 3, 5)
b = np.random.randn(2, 5, 10)
np.einsum("ijkl,mjl->kim", a, b.reshape(10,2,5))
At first I thought that the reshape is just transposing the operand, but it seems more complicated than that. Is it possible to do this operation without reshaping?
I want to split a long vector into smaller unequal pieces, do a summation on each piece and gather the results into a new vector.
I need to do this in pytorch but I am also interested to see how this is done with numpy.
This can easily be accomplish by splitting the vector.
sizes = [3, 7, 5, 9]
X = torch.ones(sum(sizes))
Y = torch.tensor([s.sum() for s in torch.split(X, sizes)])
or with np.ones and np.split.
Is there a more efficient way to do this?
Edit:
Inspired by the first comment:
indices = np.cumsum([0]+sizes)[:-1]
Y = np.add.reduceat(X, indices.tolist())
solves it for numpy. I am still looking for a solution with pytorch.
index_add_ is your friend!
# inputs
sizes = torch.tensor([3, 7, 5, 9], dtype=torch.long)
x = torch.ones(sizes.sum())
# prepare an index vector for summation (what elements of x are summed to each element of y)
ind = torch.zeros(sizes.sum(), dtype=torch.long)
ind[torch.cumsum(sizes, dim=0)[:-1]] = 1
ind = torch.cumsum(ind, dim=0)
# prepare the output
y = torch.zeros(len(sizes))
# do the actual summation
y.index_add_(0, ind, x)
I have a situation similar to the following:
import numpy as np
a = np.random.rand(55, 1, 3)
b = np.random.rand(55, 626, 3)
Here the shapes represent the number of observations, then the number of time slices per observation, then the number of dimensions of the observation at the given time slice. So b is a full representation of 3 dimensions for each of the 55 observations at one new time interval.
I'd like to stack a and b into an array with shape 55, 627, 3. How can one accomplish this in numpy? Any suggestions would be greatly appreciated!
To follow up on Divakar's answer above, the axis argument in numpy is the index of a given dimension within an array's shape. Here I want to stack a and b by virtue of their middle shape value, which is at index = 1:
import numpy as np
a = np.random.rand(5, 1, 3)
b = np.random.rand(5, 100, 3)
# create the desired result shape: 55, 627, 3
stacked = np.concatenate((b, a), axis=1)
# validate that a was appended to the end of b
print(stacked[:, -1, :], '\n\n\n', a.squeeze())
This returns:
[[0.72598529 0.99395887 0.21811998]
[0.9833895 0.465955 0.29518207]
[0.38914048 0.61633291 0.0132326 ]
[0.05986115 0.81354865 0.43589306]
[0.17706517 0.94801426 0.4567973 ]]
[[0.72598529 0.99395887 0.21811998]
[0.9833895 0.465955 0.29518207]
[0.38914048 0.61633291 0.0132326 ]
[0.05986115 0.81354865 0.43589306]
[0.17706517 0.94801426 0.4567973 ]]
A purist might use instead np.all(stacked[:, -1, :] == a.squeeze()) to validate this equivalence. All glory to #Divakar!
Strictly for the curious, the use case for this concatenation is a kind of wonky data preparation pipeline for a Long Short Term Memory Neural Network. In that kind of network, the training data shape should be number_of_observations, number_of_time_intervals, number_of_dimensions_per_observation. I am generating new predictions of each object at a new time interval, so those predictions have shape number_of_observations, 1, number_of_dimensions_per_observation. To visualize the sequence of observations' positions over time, I want to add the new positions to the array of previous positions, hence the question above.
I have a flat array b:
a = numpy.array([0, 1, 1, 2, 3, 1, 2])
And an array c of indices marking the start of each "chunk":
b = numpy.array([0, 4])
I know I can find the maximum in each "chunk" using a reduction:
m = numpy.maximum.reduceat(a,b)
>>> array([2, 3], dtype=int32)
But... Is there a way to find the index of the maximum <edit>within a chunk</edit> (like numpy.argmax), with vectorized operations (no lists, loops)?
Borrowing the idea from this post.
Steps involved :
Offset all elements in a group by a limit-offset. Sort them globally, thus limiting each group to stay at their positions, but sorting the elements within each group.
In the sorted array, we would look for the last element, which would be the group max. Their indices would be the argmax after offsetting down for the group lengths.
Thus, a vectorized implementation would be -
def numpy_argmax_reduceat(a, b):
n = a.max()+1 # limit-offset
grp_count = np.append(b[1:] - b[:-1], a.size - b[-1])
shift = n*np.repeat(np.arange(grp_count.size), grp_count)
sortidx = (a+shift).argsort()
grp_shifted_argmax = np.append(b[1:],a.size)-1
return sortidx[grp_shifted_argmax] - b
As a minor tweak and possibly faster one, we could alternatively create shift with cumsum and thus have a variation of the earlier approach, like so -
def numpy_argmax_reduceat_v2(a, b):
n = a.max()+1 # limit-offset
id_arr = np.zeros(a.size,dtype=int)
id_arr[b[1:]] = 1
shift = n*id_arr.cumsum()
sortidx = (a+shift).argsort()
grp_shifted_argmax = np.append(b[1:],a.size)-1
return sortidx[grp_shifted_argmax] - b
I'm a newbie and was trying something in python 2.7.2 with Numpy which wasn't working as expected so wanted to check if there was something basic I was misunderstanding.
I was calculating a value for a triangle (trinormals) and then updating a value per point of the triangle (vertnormals) using an array of triangle indexes (trivertexidx). As a loop I was calculating:
for itri in range(ntriangles) :
vertnormals[(trivertidx[itri,0]),:] += trinormals[itri,:]
vertnormals[(trivertidx[itri,1]),:] += trinormals[itri,:]
vertnormals[(trivertidx[itri,2]),:] += trinormals[itri,:]
As this was a little slow I thought it could be modified to :
vertnormals[(trivertidx[:,0]),:] += trinormals[:,:]
vertnormals[(trivertidx[:,1]),:] += trinormals[:,:]
vertnormals[(trivertidx[:,2]),:] += trinormals[:,:]
However this doesn't give the same results. Is there another simpler way to write the loop? Any pointers appreciated. Note the intent here was to get a single value for each entry in vertnormals and then normalise the result.
Numpy has a function bincount that can be very helpful in situations like this. The two lines bellow are the the same when the elements of index are unique, but different when index has repeated values:
A[index] += W
A += np.bincount(index, W, minlenght=len(A))
I believe you want the behavior of the second, but you're code is a little more complex because A, index, and W are not 1d. You can try something like this,
import numpy as np
N = len(vertnormals)
for j in range(vertnormals.shape[-1]):
vertnormals[:, j] += np.bincount(trivertidx[:, 0], trinormals[:, j], minlength=N)
vertnormals[:, j] += np.bincount(trivertidx[:, 1], trinormals[:, j], minlength=N)
vertnormals[:, j] += np.bincount(trivertidx[:, 2], trinormals[:, j], minlength=N)
Hope that helps.
If I am understanding your question well, you have m points from which you have formed n triangles, and trivertidx is an array of shape (n, 3) holding values in the range [0, m), where trivertidx[j] is the list of the 3 points making up the j-th triangle.
trinormals then is an array of shape (n,) holding a value assigned to each triangle, and you want vertnormals to be an array of shape (m,) holding, for each point, the sum of the values assigned to each triangle that point is a vertex of.
If the above is right, the following example should show why your second code is not working properly:
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,2,0,2]] += 1
>>> a
array([1, 2, 3, 3, 4])
Even though the element in position 2 shows up twice in the left hand side, what happens is that two copies of the same value have 1 added, and then the incremented value is copied twice to the same position.
To vectorize this summation you would need an array of shape (n, m) where the value at position [j, k] is True if vertex k is part of triangle j, False if not. You could build that array like this:
trivert = np.zeros((n, m), dtype='bool')
trivert[np.arange(n).reshape(n, 1), trivertidx] = 1
Once you have this array, you can get your sums for each vertex as
vertnormals = np.sum(trivert * trinormals.reshape(-1, 1), axis=0)