Delete rows from a ndarray in python - indexing

I have a 2D - array A, which contains the x and y coordinates of points
array([[ 0, 0],
[ 0, 0],
[ 0, 0],
[ 3, 4],
[ 4, 1],
[ 5, 10],
[ 9, 7]])
as you can see the point ( 0 , 0 ) appears more often.
I want to delete this point so that the array looks like this:
array([[ 3, 4],
[ 4, 1],
[ 5, 10],
[ 9, 7]])
Since the array in real is very huge, it is very important to do this without for loops, otherwise it takes very long.
I'm new to python but i'm used to matlab, where I can solve it very easily with:
A (A(:,1) == 0 & A(:,2) == 0, :) = []
I thought it is almost the same or very similar in python, but I can't figure it out - am totally stuck. Errors like "use a.any()/all()" or "ufunc "bitwise_and" not supported for the input types" appear and I don't know what I should change.

Technically what you are doing in MATLAB is not deleting elements from A. What you are actually doing is creating a new array that lacks the elements of A. It is equivalent to:
>> A = A (A(:,1) ~= 0 | A(:,2) ~= 0, :);
You can do exactly the same thing in numpy:
>>> a = a[(a[:,0] != 0) | (a[:,1] != 0), :]
However, thanks to numpy's automatic broadcasting, you can make this simpler:
>>> a = a[(a != [0, 0]).any(1)]
This will work for any target array so long as it has the same number of columns as a.

Related

How can I efficiently mask out certain pairs in (2, N) tensor?

I have a torch tensor edge_index of shape (2, N) that represents edges in a graph. For each (x, y) there is also a (y, x), where x and y are node IDs (ints). During the forward pass of my model I need to mask out certain edges. So, for example, I have:
n1 = [0, 3, 4] # list of node ids as x
n2 = [1, 2, 1] # list of node ids as y
edge_index = [[1, 2, 0, 1, 3, 4, 2, 3, 1, 4, 2, 4], # actual edges as (x, y) and (y, x)
[2, 1, 1, 0, 4, 3, 3, 2, 4, 1, 4, 2]]
# do something that efficiently removes (x, y) and (y, x) edges as formed by n1 and n2
Final edge_index should look like:
>>> edge_index
[[1, 2, 3, 4, 2, 4],
[2, 1, 4, 3, 4, 2]]
Preferably we need to efficiently make some kind of boolean mask that I can apply to edge index e.g. as edge_index[:, mask] or something like that.
Could also be done in numpy but I'd like to avoid converting back and forth.
Edit #1:
If that can't be done, then I can think of a way so that, instead of n1 and n2, I have access to the indices of the positions I need to exclude in one tensor e.g. _except=[2, 3, 6, 7, 8, 9] (by making a dict/index once in the beginning).
Is there a way to get the desired result by "telling" edge_index to drop the indices in except? edge_index[:, _except] gives me the ones I want to get rid of. I need its complement operation.
Edit #2:
I managed to do it like this:
mask = torch.ones(edge_index.shape[1], dtype=torch.bool)
for i in range(len(n1)):
mask = mask & ~(torch.tensor([n1[i], n2[i]], dtype=torch.long) == edge_index.T).all(dim=1) & ~(torch.tensor([n2[i], n1[i]], dtype=torch.long) == edge_index.T).all(dim=1)
edge_index[:, mask]
but it is too slow and I can't use it. How can I speed it up?
Edit #3: I managed to solve this Edit#1 efficiently with:
mask = torch.ones(edge_index.shape[1], dtype=torch.bool)
mask[_except] = False
edge_index[:, mask]
Still interested in solving the original problem if someone comes up with something...
If you're ok with the way you suggested at Edit#1,
you get the complement result by:
edge_index[:, [i for i in range(edge_index.shape[1]) if not (i in _except)]]
hope this is fast enough for your requirement.
Edit 1:
from functools import reduce
ids = torch.stack([torch.tensor(n1), torch.tensor(n2)], dim=1)
ids = torch.cat([ids, ids[:, [1,0]]], dim=0)
res = edge_index.unsqueeze(0).repeat(6, 1, 1) == ids.unsqueeze(2).repeat(1, 1, 12)
mask = ~reduce(lambda x, y: x | (reduce(lambda p, q: p & q, y)), res, reduce(lambda p, q: p & q, res[0]))
edge_index[:, mask]
Edit 2:
ids = torch.stack([torch.tensor(n1), torch.tensor(n2)], dim=1)
ids = torch.cat([ids, ids[:, [1,0]]], dim=0)
res = edge_index.unsqueeze(0).repeat(6, 1, 1) == ids.unsqueeze(2).repeat(1, 1, 12)
mask = ~(res.sum(1) // 2).sum(0).bool()
edge_index[:, mask]

numpy append in a for loop with different sizes

I have a for loop but where i has changes by 2 and i want to save a value in a numpy array in each iteration that that changes by 1.
n = 8 #steps
# random sequence
rand_seq = np.zeros(n-1)
for i in range(0, (n-1)*2, 2):
curr_state= i+3
I want to get curr_state outside the loop in the rand_seq array (seven values).
can you help me with that?
thanks a lot
A much simpler version (if I understand the question correctly) would be:
np.arange(3, 15+1, 2)
where 3 = start, 15 = stop, 2 = step size.
In general, when using numpy try to avoid adding elements in a for loop as this is inefficient. I would suggest checking out the documentation of np.arange(), np.array() and np.zeros() as in my experience, these will solve 90% of array - creation issues.
A straight forward list iteration:
In [313]: alist = []
...: for i in range(0,(8-1)*2,2):
...: alist.append(i+3)
...:
In [314]: alist
Out[314]: [3, 5, 7, 9, 11, 13, 15]
or cast as a list comprehension:
In [315]: [i+3 for i in range(0,(8-1)*2,2)]
Out[315]: [3, 5, 7, 9, 11, 13, 15]
Or if you make an array with the same range parameters:
In [316]: arr = np.arange(0,(8-1)*2,2)
In [317]: arr
Out[317]: array([ 0, 2, 4, 6, 8, 10, 12])
you can add the 3 with one simple expression:
In [318]: arr + 3
Out[318]: array([ 3, 5, 7, 9, 11, 13, 15])
With lists, iteration and comprehensions are great. With numpy you should try to make an array, such as with arange, and modify that with whole-array methods (not with iterations).

Unexpected behavior when trying to normalize a column in numpy.array (version 1.17.4)

So, I was trying to normalize (i.e. max = 1, min = value/max) a specific column within a numpy array.
I hoped this piece of code would do the trick:
bar = np.arange(12).reshape(6,2)
bar
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
bar[:,1] = bar[:,1] / bar[:,1].max()
bar
array([[ 0, 0],
[ 2, 0],
[ 4, 0],
[ 6, 0],
[ 8, 0],
[10, 1]])
works as expected if the type of each value is 'float'.
foo = np.array([[1.1,2.2],
[3.3,4.4],
[5.5,6.6]])
foo[:,1] = foo[:,1] / foo[:,1].max()
foo
array([[1.1 , 0.33333333],
[3.3 , 0.66666667],
[5.5 , 1. ]])
I guess what I'm asking is where is this default 'int' I'm missing here?
(I'm taking this as a 'learning opportunity')
If you simply execute:
out = bar[:,1] / bar[:,1].max()
print(out)
>>> [0.09090909 0.27272727 0.45454545 0.63636364 0.81818182 1. ]
It's working just fine, since out is a newly created float array made to store these float values. But np.arange(12) gives you an int array by default. bar[:,1] = bar[:,1] / bar[:,1].max() tries to store the float values inside the integer array, and all the values become integers and you get [0 0 0 0 0 1].
To set the array as a float by default:
bar = np.arange(12, dtype='float').reshape(6,2)
Alternatively, you can also use:
bar = np.arange(12).reshape(6,2).astype('float')
It isn't uncommon for us to need to change the data type of the array throughout the program, as you may not always need the dtype you define originally. So .astype() is actually pretty handy in all kinds of scenarios.
From np.arange documentation :
dtype : dtype
The type of the output array. If dtype is not given, infer the data type from the other input arguments.
Since you passed int values it will infer that the values in the array are int and so they won't change to float, you can do like this if you want:
bar = np.arange(12.0).reshape(6,2)

numpy: Cleanly retrieve coordinates (indices) for highest k values - along a specific axis - in ndarray

I would like to be able to:
select k highest values along (or across?) the first dimension
find indices for those k values
assign those values to a new ndarray of equal shape at their respective positions.
I'm wondering if there is a quicker way to achieve the result exemplified below. In particular, I would like to avoid making the batch indices "manually".
Here's my solution:
# Create unordered array (instrumental to the example)
arr = np.arange(24).reshape(2, 3, 4)
arr_1 = arr[0,::2].copy()
arr_2 = arr[1,1::].copy()
arr[0,::2] = arr_2[:,::-1]
arr[1,1:] = arr_1[:,::-1]
# reshape array to: (batch_size, H*W)
arr_batched = arr.reshape(arr.shape[0], -1)
# find indices for k greatest values along all but the 1st dimension.
gr_ind = np.argpartition(arr_batched, -k)[:, -k]
# flatten and unravel indices.
maxk_ind_flat = gr_ind.flatten()
maxk_ind_shape = np.unravel_index(maxk_ind_flat, arr.shape)
# maxk_ind_shape prints: (array([0, 0, 0, 0]), array([2, 2, 0, 0]), array([1, 0, 2, 3]))
# note: unraveling indices obtained by partitioning an array of shape (2, n) will not keep into account the first dimension (here [0,0,0,0])
# Craft batch indices...
batch_indices = np.repeat(np.arange(arr.shape[0], k)
# ...and join
maxk_indices = tuple([batch_indices]+[ind for ind in maxk_ind_shape[1:]])
# The result is used to re-assign k-highest values for each batch element to a destination matrix:
arr2 = np.zeros_like(arr)
arr2[maxk_indices] = arr[maxk_indices]
# arr2 prints:
# array([[[ 0, 0, 0, 0],
# [ 0, 0, 0, 0],
# [23,22, 0, 0]],
#
# [[ 0, 0, 14, 15],
# [ 0, 0, 0, 0],
# [ 0, 0, 0, 0]]])
Any help would be appreciated.
One way would be to use np.[put/take]_along_axis:
gr_ind = np.argpartition(arr_batched,-k,axis=-1)[:,-k:]
arr_2 = np.zeros_like(arr)
np.put_along_axis(arr_2.reshape(arr_batched.shape),gr_ind,np.take_along_axis(arr_batched,gr_ind,-1),-1)

Numpy fancy indexing with 2D array - explanation

I am (re)building up my knowledge of numpy, having used it a little while ago.
I have a question about fancy indexing with multidimenional (in this case 2D) arrays.
Given the following snippet:
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> i = np.array( [ [0,1], # indices for the first dim of a
... [1,2] ] )
>>> j = np.array( [ [2,1], # indices for the second dim
... [3,3] ] )
>>>
>>> a[i,j] # i and j must have equal shape
array([[ 2, 5],
[ 7, 11]])
Could someone explain in simple English, the logic being applied to give the results produced. Ideally, the explanation would be applicable for 3D and higher rank arrays being used to index an array.
Conceptually (in terms of restrictions placed on "rows" and "columns"), what does it mean to index using a 2D array?
Conceptually (in terms of restrictions placed on "rows" and "columns"), what does it mean to index using a 2D array?
It means you are constructing a 2d array R, such that R=A[B, C]. This means that the value for rij=abijcij.
So it means that the item located at R[0,0] is the item in A with as row index B[0,0] and as column index C[0,0]. The item R[0,1] is the item in A with row index B[0,1] and as column index C[0,1], etc.
So in this specific case:
>>> b = a[i,j]
>>> b
array([[ 2, 5],
[ 7, 11]])
b[0,0] = 2 since i[0,0] = 0, and j[0,0] = 2, and thus a[0,2] = 2. b[0,1] = 5 since i[0,0] = 1, and j[0,0] = 1, and thus a[1,1] = 5. b[1,0] = 7 since i[0,0] = 1, and j[0,0] = 3, and thus a[1,3] = 7. b[1,1] = 11 since i[0,0] = 2, and j[0,0] = 3, and thus a[2,3] = 11.
So you can say that i will determine the "row indices", and j will determine the "column indices". Of course this concept holds in more dimensions as well: the first "indexer" thus determines the indices in the first index, the second "indexer" the indices in the second index, and so on.