Numpy Advanced Indexing : How the broadcast is happening? - numpy

array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
if we run the following statement
x[1:, [2,0,1]]
we get the following result
array([[ 6, 4, 5],
[10, 8, 9]])
According to numpy's doc:
Advanced indexes always are broadcast and iterated as one:
I am unable to understand how the pairing of indices is happening here and also broadcasting .

The selected answer is not correct.
Here the [2,0,1] indeed has shape (3,) and will not be extended during broadcasting.
While 1: means you first slicing the array before broadcasting. During the broadcasting, just think of the slicing : as a placeholder for a 0d-scalar at each run. So we get:
shape([2,0,1]) = (3,)
shape([:]) = () -> (1,) -> (3,)
So it's the [:] conceptually extended into shape (3,), like this:
x[[1,1,1], [2,0,1]] =
[6 4 5]
x[[2,2,2], [2,0,1]] =
[10 8 9]
Finally, we need to stack the results back
[[6 4 5]
[10 8 9]]

From NumPy User Guide, Section 3.4.7 Combining index arrays with slices
the slice is converted to an index array np.array that is broadcast
with the index array to produce the resultant array.
In our case the slice 1: is converted to to an index array np.array([[1,2]]) which has shape (1,2) . This is row index array.
The next index array ( column index array) np.array([2,0,1]) has shape (3,2)
row index array shape (1,2)
column index array shape (3,2)
the index arrays do not have the same shape. But they can be broadcasted to same shape. The row index array is broadcasted to match the shape of column index array.

Related

Assign numpy matrix to pandas columns

I have dataframe with 48870 rows and calculated embeddings with shape (48870, 768)
I wanna assign this embeddings to padnas column
When i try
test['original_text_embeddings'] = embeddings
I have an error: Wrong number of items passed 768, placement implies 1
I know if a make something like df.loc['original_text_embeddings'] = embeddings[0] will work but i need to automate this process
A dataframe/column needs a 1d list/array:
In [84]: x = np.arange(12).reshape(3,4)
In [85]: pd.Series(x)
...
ValueError: Data must be 1-dimensional
Splitting the array into a list (of arrays):
In [86]: pd.Series(list(x))
Out[86]:
0 [0, 1, 2, 3]
1 [4, 5, 6, 7]
2 [8, 9, 10, 11]
dtype: object
In [87]: _.to_numpy()
Out[87]:
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])],
dtype=object)
Your embeddings have 768 columns, which would translate to equally 768 columns in a data frame. You are trying to assign all columns from the embeddings to just one column in the data frame, which is not possible.
What you could do is generating a new data frame from the embeddings and concatenate the test df with the embedding df
embedding_df = pd.DataFrame(embeddings)
test = pd.concat([test, embedding_df], axis=1)
Have a look at the documentation for handling indexes and concatenating on different axis:
https://pandas.pydata.org/docs/reference/api/pandas.concat.html

Numpy multi-dimensional array slicing

I have a 3-D NumPy array with shape (100, 50, 20). I was trying to slice the third dimension of the array by using the index, e.g., from 1 to 6 and from 8 to 10.
I tried the following code, but it kept reporting a syntax error.
newarr [:,:,1:10] = oldarr[:,:,[1:7,8:11]]
You can use np.r_ to concatenate slice objects:
newarr [:,:,1:10] = oldarr[:,:,np.r_[1:7,8:11]]
Example:
np.r_[1:4,6:8]
array([1, 2, 3, 6, 7])

Numpy, how to retrieve sub-array of array (specific indices)?

I have an array:
>>> arr1 = np.array([[1,2,3], [4,5,6], [7,8,9]])
array([[1 2 3]
[4 5 6]
[7 8 9]])
I want to retrieve a list (or 1d-array) of elements of this array by giving a list of their indices, like so:
indices = [[0,0], [0,2], [2,0]]
print(arr1[indices])
# result
[1,6,7]
But it does not work, I have been looking for a solution about it for a while, but I only found ways to select per row and/or per column (not per specific indices)
Someone has any idea ?
Cheers
Aymeric
First make indices an array instead of a nested list:
indices = np.array([[0,0], [0,2], [2,0]])
Then, index the first dimension of arr1 using the first values of indices, likewise the second:
arr1[indices[:,0], indices[:,1]]
It gives array([1, 3, 7]) (which is correct, your [1, 6, 7] example output is probably a typo).

Argmax indexing in pytorch with 2 tensors of equal shape

Summarize the problem
I am working with high dimensional tensors in pytorch and I need to index one tensor with the argmax values from another tensor. So I need to index tensor y of dim [3,4] with the results from the argmax of tensor xwith dim [3,4]. If tensors are:
import torch as T
# Tensor to get argmax from
# expected argmax: [2, 0, 1]
x = T.tensor([[1, 2, 8, 3],
[6, 3, 3, 5],
[2, 8, 1, 7]])
# Tensor to index with argmax from preivous
# expected tensor to retrieve [2, 4, 9]
y = T.tensor([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
# argmax
x_max, x_argmax = T.max(x, dim=1)
I would like an operation that given the argmax indexes of x, or x_argmax, retrieves the values in tensor y in the same indexes x_argmax indexes.
Describe what you’ve tried
This is what I have tried:
# What I have tried
print(y[x_argmax])
print(y[:, x_argmax])
print(y[..., x_argmax])
print(y[x_argmax.unsqueeze(1)])
I have been reading a lot about numpy indexing, basic indexing, advanced indexing and combined indexing. I have been trying to use combined indexing (since I want a slice in first dimension of the tensor and the indexes values on the second one). But I have not been able to come up with a solution for this use case.
You are looking for torch.gather:
idx = torch.argmax(x, dim=1, keepdim=true) # get argmax directly, w/o max
out = torch.gather(y, 1, idx)
Resulting with
tensor([[2],
[4],
[9]])
How about y[T.arange(3), x_argmax]?
That does the job for me...
Explanation: You take dimensional information away when you invoke T.max(x, dim=1), so this information needs to be restored explicitly.

Convert multidimensional array elements into same number of arrays

I am doing a Computer Vision project in which I am getting an error 'setting an array element with a sequence' when I am trying to change the data type of input image matrix.
I realized this is happening because the input image matrix I am having does not have the same number of elements in all of its array. Is there any way I can convert that input image into the 2D array with the same number of elements in each of its array?
I am getting an error when I am trying to execute the following line:
X_train = X_train.astype('float32')
Any help would be appreciated.
Cheers.
You need to pad the rows with less elements with zeros to make their lengths equal to the length of the longest array (or list) in the list of lists (matrix).
Below's a code snippet to pad a list of lists of unequal lengths to a matrix of same row-lengths:
import numpy as np
unpadded_matrix = np.array([[1, 2], [3, 4, 5], [6, 7, 8, 9]])
max_len = max([len(row) for row in unpadded_matrix])
np.array([row + [0]*(max_len-len(row)) for row in unpadded_matrix])
o/p:
array([[1, 2, 0, 0],
[3, 4, 5, 0],
[6, 7, 8, 9]])