Numpy multi-dimensional array slicing - numpy

I have a 3-D NumPy array with shape (100, 50, 20). I was trying to slice the third dimension of the array by using the index, e.g., from 1 to 6 and from 8 to 10.
I tried the following code, but it kept reporting a syntax error.
newarr [:,:,1:10] = oldarr[:,:,[1:7,8:11]]

You can use np.r_ to concatenate slice objects:
newarr [:,:,1:10] = oldarr[:,:,np.r_[1:7,8:11]]
Example:
np.r_[1:4,6:8]
array([1, 2, 3, 6, 7])

Related

Assign numpy matrix to pandas columns

I have dataframe with 48870 rows and calculated embeddings with shape (48870, 768)
I wanna assign this embeddings to padnas column
When i try
test['original_text_embeddings'] = embeddings
I have an error: Wrong number of items passed 768, placement implies 1
I know if a make something like df.loc['original_text_embeddings'] = embeddings[0] will work but i need to automate this process
A dataframe/column needs a 1d list/array:
In [84]: x = np.arange(12).reshape(3,4)
In [85]: pd.Series(x)
...
ValueError: Data must be 1-dimensional
Splitting the array into a list (of arrays):
In [86]: pd.Series(list(x))
Out[86]:
0 [0, 1, 2, 3]
1 [4, 5, 6, 7]
2 [8, 9, 10, 11]
dtype: object
In [87]: _.to_numpy()
Out[87]:
array([array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([ 8, 9, 10, 11])],
dtype=object)
Your embeddings have 768 columns, which would translate to equally 768 columns in a data frame. You are trying to assign all columns from the embeddings to just one column in the data frame, which is not possible.
What you could do is generating a new data frame from the embeddings and concatenate the test df with the embedding df
embedding_df = pd.DataFrame(embeddings)
test = pd.concat([test, embedding_df], axis=1)
Have a look at the documentation for handling indexes and concatenating on different axis:
https://pandas.pydata.org/docs/reference/api/pandas.concat.html

What does the [1] do when using .where()?

I m practicing on a Data Cleaning Kaggle excercise.
In parsing dates example I can´t figure out what the [1] does at the end of the indices object.
Thanks..
# Finding indices corresponding to rows in different date format
indices = np.where([date_lengths == 24])[1]
print('Indices with corrupted data:', indices)
earthquakes.loc[indices]
As described in the documentation, numpy.where called with a single argument is equivalent to calling np.asarray([date_lengths == 24]).nonzero().
numpy.nonzero return a tuple with as many items as the dimensions of the input array with the indexes of the non-zero values.
>>> np.nonzero([1,0,2,0])
(array([0, 2]),)
Slicing [1] enables to get the second element (i.e. second dimension) but as the input was wrapped into […], this is equivalent to doing:
np.where(date_lengths == 24)[0]
>>> np.nonzero([1,0,2,0])[0]
array([0, 2])
It is an artefact of the extra [] around the condition. For example:
a = np.arange(10)
To find, for example, indices where a>3 can be done like this:
np.where(a > 3)
gives as output a tuple with one array
(array([4, 5, 6, 7, 8, 9]),)
So the indices can be obtained as
indices = np.where(a > 3)[0]
In your case, the condition is between [], which is unnecessary, but still works.
np.where([a > 3])
returns a tuple of which the first is an array of zeros, and the second array is the array of indices you want
(array([0, 0, 0, 0, 0, 0]), array([4, 5, 6, 7, 8, 9]))
so the indices are obtained as
indices = np.where([a > 3])[1]

Why does numpy.delete always delete one extra element in my test?

I want to use numpy.delete to delete certain elements in an array
import numpy as np
aa = np.array([1,2,3,4,5,6,7,8])
bb = np.array([0,0,0,0,0,0,0,0], dtype='bool')
np.delete(aa,bb)
gives me the results:
array([2, 3, 4, 5, 6, 7, 8])
I expect the results like this:
array([1, 2, 3, 4, 5, 6, 7, 8])
And if I change the bb to
bb = np.array([1,0,0,0,0,0,0,0], dtype='bool')
I got:
np.delete(aa,bb)
array([3, 4, 5, 6, 7, 8])
The code is simple, but I do not understand why numpy.delete behaves like this. Any explanations?
When I paste your code into a repl, I get the intended output. I am using Numpy v1.19.4 and Python 3.8.5. Check if there is an update for Numpy, and make sure that you are not doing any operations after that may remove the first item in the array.
np.delete is using an array of ints as the indices to remove. If you use a bool, that is converted to an int (False =0, True=1). So what you are doing in your first example is saying remove the 0 index value and in the second example you have a 1 and a 0 so it's removing those indexes.
In the future numpy will not cast the booleans as integers.
FutureWarning: in the future insert will treat boolean arrays and array-likes as boolean index instead of casting it to integer

Numpy Advanced Indexing : How the broadcast is happening?

array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
if we run the following statement
x[1:, [2,0,1]]
we get the following result
array([[ 6, 4, 5],
[10, 8, 9]])
According to numpy's doc:
Advanced indexes always are broadcast and iterated as one:
I am unable to understand how the pairing of indices is happening here and also broadcasting .
The selected answer is not correct.
Here the [2,0,1] indeed has shape (3,) and will not be extended during broadcasting.
While 1: means you first slicing the array before broadcasting. During the broadcasting, just think of the slicing : as a placeholder for a 0d-scalar at each run. So we get:
shape([2,0,1]) = (3,)
shape([:]) = () -> (1,) -> (3,)
So it's the [:] conceptually extended into shape (3,), like this:
x[[1,1,1], [2,0,1]] =
[6 4 5]
x[[2,2,2], [2,0,1]] =
[10 8 9]
Finally, we need to stack the results back
[[6 4 5]
[10 8 9]]
From NumPy User Guide, Section 3.4.7 Combining index arrays with slices
the slice is converted to an index array np.array that is broadcast
with the index array to produce the resultant array.
In our case the slice 1: is converted to to an index array np.array([[1,2]]) which has shape (1,2) . This is row index array.
The next index array ( column index array) np.array([2,0,1]) has shape (3,2)
row index array shape (1,2)
column index array shape (3,2)
the index arrays do not have the same shape. But they can be broadcasted to same shape. The row index array is broadcasted to match the shape of column index array.

numpy array.shape behaviour

For following:
d = np.array([[0,1,4,3,2],[10,18,4,7,5]])
print(d.shape)
Output is:
(2, 5)
It is expected.
But, for this(difference in number of elements in individual rows):
d = np.array([[0,1,4,3,2],[10,18,4,7]])
print(d.shape)
Output is:
(2,)
How to explain this behaviour?
Short answer: It parses it as an array of two objects: two lists.
Numpy is used to process "rectangular" data. In case you pass it non-rectangular data, the np.array(..) function will fallback on considering it a list of objects.
Indeed, take a look at the dtype of the array here:
>>> d
array([list([0, 1, 4, 3, 2]), list([10, 18, 4, 7])], dtype=object)
It is an one-dimensional array that contains two items two lists. These lists are simply objects.