Reversing order of second column only in for a 2x2 ND-Array - numpy

Right now I have a 2x2 ND-array, namely np.array([[93, 95], [84, 100], [99, 87]]). I would like to reverse the second column of the array into, such that I get: np.array([[93, 87], [84, 100], [99, 95]]).
I tried the following code:
grades = np.array([[93, 95], [84, 100], [99, 87]])
print(grades[::-1,:])
However, the result I get is
[[ 99 87]
[ 84 100]
[ 93 95]]
I understand that this is because I am reversing all of the entries in the 1-axis, which is why the entries in the first column is also reversed. So what code can I write to get:
[[ 93 87]
[ 84 100]
[ 99 95]]

Use numpy.flip function to reverse the order of values in specific column(s), though it's more suitable for a more extended/general cases:
grades = np.array([[93, 95], [84, 100], [99, 87], [99, 83]])
grades[:, -1] = np.flip(grades[:, -1])
print(grades)
Or just use the reversed order of rows:
grades[:, -1] = grades[::-1, -1]
[[ 93 83]
[ 84 87]
[ 99 100]
[ 99 95]]

Related

Create nested array for all unique indices in a pandas MultiIndex DataFrame

generate dummy data
np.random.seed(42)
df = pd.DataFrame({'subject': ['A'] * 10 + ['B'] * 10,
'trial': list(range(5)) * 4,
'value1': np.random.randint(0, 100, 20),
'value2': np.random.randint(0, 100, 20)
})
df = df.set_index(['subject', 'trial']).sort_index()
print(df)
value1 value2
subject trial
A 0 51 1
0 20 75
1 92 63
1 82 57
2 14 59
2 86 21
3 71 20
3 74 88
4 60 32
4 74 48
B 0 87 90
0 52 79
1 99 58
1 1 14
2 23 41
2 87 61
3 2 91
3 29 61
4 21 59
4 37 46
Notice: Each subject / trial combination has multiple rows.
I want to create a array with the rows as nested dimensions.
My (as I find ugly) data transformation via list
tmp=list()
for idx in df.index.unique():
tmp.append(df.loc[idx].to_numpy())
goal = np.array(tmp)
print(goal)
[[[51 1]
[20 75]]
...
[[21 59]
[37 46]]]
Can you show me a native pandas / numpy way to do it (without the list crutch)?
To be able to generate a non-ragged numpy array, the number of duplicates must be equal for all values. Thus you don't have to loop over them. Just find out the number and reshape
n = len(df)/(~df.index.duplicated()).sum()
assert n.is_integer()
out = df.to_numpy().reshape(-1, df.shape[1], int(n))
Output:
array([[[51, 1],
[20, 75]],
[[92, 63],
[82, 57]],
[[14, 59],
[86, 21]],
[[71, 20],
[74, 88]],
[[60, 32],
[74, 48]],
[[87, 90],
[52, 79]],
[[99, 58],
[ 1, 14]],
[[23, 41],
[87, 61]],
[[ 2, 91],
[29, 61]],
[[21, 59],
[37, 46]]])
You can use stack:
<code>df.stack().values
</code>
Output:
<code>array([[ 0, 25],
[16, 11],
[49, 87],
[38, 77],
[67, 6],
[27, 27],
[40, 0],
[22, 81],
[83, 89],
[36, 55],
[41, 1],
[13, 74],
[88, 61],
[85, 73],
[55, 66],
[44, 82],
[20, 30],
[82, 69],
[37, 71],
[30, 16],
[81, 96],
[ 0, 56],
[ 5, 99],
[73, 86]], dtype=int64)
</code>

Descending sorting in numpy by several columns [duplicate]

This question already has answers here:
Numpy sort ndarray on multiple columns
(4 answers)
Closed last year.
I have NumPy array and need to sort it by two columns (first by column 0 and then sort equal values by column 1), both in descending order. When I try to sort sequentially by column 1 and column 0, the rows equal in the second sorting turn to be sorted in ascending order in the first sorting.
My array:
arr = np.array([
[150, 8],
[105, 20],
[90, 100],
[101, 12],
[110, 80],
[105, 100],
])
When I sort twice (by column 1 and column 0):
arr = arr[arr[:,1].argsort(kind='stable')[::-1]]
arr = arr[arr[:,0].argsort(kind='stable')[::-1]]
I have this result (where rows 2 and 3 are swapped):
array([[150, 8],
[110, 80],
[105, 20],
[105, 100],
[101, 12],
[ 90, 100]])
As far as I understand, it happens because stable mode preserves the original order for equal values, but when we flip the indices to make the order descend, the original order changes too.
The results I'd like to have:
array([[150, 8],
[110, 80],
[105, 100],
[105, 20],
[101, 12],
[ 90, 100]])
Use numpy.lexsort to sort on multiple columns at the same time.
arr = np.array([
[150, 8],
[105, 20],
[90, 100],
[101, 12],
[110, 80],
[105, 100],
])
order = np.lexsort([arr[:, 1], arr[:, 0]])[::-1]
arr[order]
yields:
array([[150, 8],
[110, 80],
[105, 100],
[105, 20],
[101, 12],
[ 90, 100]])

How does NumPy calculate inner product of two 2D matrices?

I'm unable to understand how NumPy calculates the inner product of two 2D matrices.
For example, this program:
mat = [[1, 2, 3, 4],
[5, 6, 7, 8]]
result = np.inner(mat, mat)
print('\n' + 'result: ')
print(result)
print('')
produces this output:
result:
[[ 30 70]
[ 70 174]]
How are these numbers calculated ??
Before somebody says "read the documentation" I did, https://numpy.org/doc/stable/reference/generated/numpy.inner.html, it's not clear to me from this how this result is calculated.
Before somebody says "check the Wikipedia article" I did, https://en.wikipedia.org/wiki/Frobenius_inner_product shows various math symbols I'm not familiar with and does not explain how a calculation such as the one above is performed.
Before somebody says "Google it", I did, most examples are for 1-d arrays (which is an easy calculation), and others like this video https://www.youtube.com/watch?v=_YtHyjcQ1gw produce a different result than NumPy does.
Any clarification would be greatly appreciated.
In [55]: mat = [[1, 2, 3, 4],
...: [5, 6, 7, 8]]
...:
In [56]: arr = np.array(mat)
In [58]: arr.dot(arr.T)
Out[58]:
array([[ 30, 70],
[ 70, 174]])
That's a matrix product of a (2,4) with a (4,2), resulting in a (2,2). This is the usual 'scan across the columns, down the rows' method.
A couple of other expressions that do this:
I like the expressiveness of einsum, where the sum-of-products is on the j dimension:
In [60]: np.einsum('ij,kj->ik',arr,arr)
Out[60]:
array([[ 30, 70],
[ 70, 174]])
With broadcasted elementwise multiplication and summation:
In [61]: (arr[:,None,:]*arr[None,:,:]).sum(axis=-1)
Out[61]:
array([[ 30, 70],
[ 70, 174]])
Without the sum, the products are:
In [62]: (arr[:,None,:]*arr[None,:,:])
Out[62]:
array([[[ 1, 4, 9, 16],
[ 5, 12, 21, 32]],
[[ 5, 12, 21, 32],
[25, 36, 49, 64]]])
Which are the values you discovered.
I finally found this site https://www.tutorialspoint.com/numpy/numpy_inner.htm which explains things a little better. The above is computed as follows:
(1*1)+(2*2)+(3*3)+(4*4) (1*5)+(2*6)+(3*7)+(4*8)
1 + 4 + 9 + 16 5 + 12 + 21 + 32
= 30 = 70
(5*1)+(6*2)+(7*3)+(8*4) (5*5)+(6*6)+(7*7)+(8*8)
5 + 12 + 21 + 32 25 + 36 + 49 + 64
= 70 = 174

Numpy: Select by index array along an axis

I'd like to select elements from an array along a specific axis given an index array. For example, given the arrays
a = np.arange(30).reshape(5,2,3)
idx = np.array([0,1,1,0,0])
I'd like to select from the second dimension of a according to idx, such that the resulting array is of shape (5,3). Can anyone help me with that?
You could use fancy indexing
a[np.arange(5),idx]
Output:
array([[ 0, 1, 2],
[ 9, 10, 11],
[15, 16, 17],
[18, 19, 20],
[24, 25, 26]])
To make this more verbose this is the same as:
x,y,z = np.arange(a.shape[0]), idx, slice(None)
a[x,y,z]
x and y are being broadcasted to the shape (5,5). z could be used to select any columns in the output.
I think this gives the results you are after - it uses np.take_along_axis, but first you need to reshape your idx array so that it is also a 3d array:
a = np.arange(30).reshape(5, 2, 3)
idx = np.array([0, 1, 1, 0, 0]).reshape(5, 1, 1)
results = np.take_along_axis(a, idx, 1).reshape(5, 3)
Giving:
[[ 0 1 2]
[ 9 10 11]
[15 16 17]
[18 19 20]
[24 25 26]]

Generating boolean dataframe based on contents in series and dataframe

I have:
df = pd.DataFrame(
[
[22, 33, 44],
[55, 11, 22],
[33, 55, 11],
],
index=["abc", "def", "ghi"],
columns=list("abc")
) # size(3,3)
and:
unique = pd.Series([11, 22, 33, 44, 55]) # size(1,5)
then I create a new df based on unique and df, so that:
df_new = pd.DataFrame(index=unique, columns=df.columns) # size(5,3)
From this newly created df, I'd like to create a new boolean df based on unique and df, so that the end result is:
df_new = pd.DataFrame(
[
[0, 1, 1],
[1, 0, 1],
[1, 1, 0],
[0, 0, 1],
[1, 1, 0],
],
index=unique,
columns=df.columns
)
This new df is either true or false depending on whether the value is present in the original dataframe or not. For example, the first column has three values: [22, 55, 33]. In a df with dimensions (5,3), this first column would be: [0, 1, 1, 0, 1] i.e. [0, 22, 33, 0 , 55]
I tried filter2 = unique.isin(df) but this doesn't work, also notnull. I tried applying a filter but the dimensions returned were incorrect. How can I do this?
Use DataFrame.stack with DataFrame.reset_index, DataFrame.pivot, then check if not missing values by DataFrame.notna, cast to integers for True->1 and False->0 mapping and last remove index and columns names by DataFrame.rename_axis:
df_new = (df.stack()
.reset_index(name='v')
.pivot('v','level_1','level_0')
.notna()
.astype(int)
.rename_axis(index=None, columns=None))
print (df_new)
a b c
11 0 1 1
22 1 0 1
33 1 1 0
44 0 0 1
55 1 1 0
Helper Series is not necessary, but if there is more values or is necessary change order by helper Series use add DataFrame.reindex:
#added 66
unique = pd.Series([11, 22, 33, 44, 55,66])
df_new = (df.stack()
.reset_index(name='v')
.pivot('v','level_1','level_0')
.reindex(unique)
.notna()
.astype(int)
.rename_axis(index=None, columns=None))
print (df_new)
a b c
11 0 1 1
22 1 0 1
33 1 1 0
44 0 0 1
55 1 1 0
66 0 0 0