For loop to obtain sum and mean on np 3d array - numpy

I have the following array
arr = np.array([[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]])
I want to go through each element and sum on axis 0, so I do:
lst = []
for x in arr:
for y in np.sum(x,axis=0):
lst.append(y)
where now the lst is
[5, 7, 9, 17, 19, 21]
However I want the output to be in the following form:
[[5, 7, 9], [17, 19, 21]]
to then take the mean of its axis 0 namely (5+17)/2 and so on. The final output should look like
[11., 13., 15.]
I wonder how can I do this? Is it possible to write this whole operation in a compact form as list comprehension?
Update: To get the final output I can do:
np.mean(np.reshape(lst, (len(arr),-1)),axis=0)
Yet I am sure there is a Pythonic way of doing this

In [5]: arr = np.array([[[1, 2, 3], [4, 5, 6]],
...: [[7, 8, 9], [10, 11, 12]]])
In [7]: arr
Out[7]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
The for iterates on the 1st dimension, as though it was a list of arrays:
In [8]: for x in arr:print(x)
[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]
list(arr) also makes a list (but it is slower than `arr.tolist()).
One common way of iterating on other dimensions is to use an index:
In [10]: for i in range(2):print(arr[:,i])
[[1 2 3]
[7 8 9]]
[[ 4 5 6]
[10 11 12]]
You could also transpose the array placing the desired axis first.
But you don't need to iterate
In [13]: arr.sum(axis=1)
Out[13]:
array([[ 5, 7, 9],
[17, 19, 21]])
In [14]: arr.sum(axis=1).mean(axis=0)
Out[14]: array([11., 13., 15.])

Related

Why is the correlation one when values differ?

I have a dataframe book_matrix with users as rows, books as columns, and ratings as values. When I use corrwith() to compute the correlation between 'The Lord of the Rings' and 'The Silmarillion' the result is 1.0, but the values are clearly different.
The non-null values [10, 3] and [10, 9] have correlation 1.0. I would expect them to be exactly the same when the correlation is equal to one. How can this happen?
Correlation means the values have a certain relationship with one another, for example linear combination of factors. Here's an illustration:
import pandas as pd
df1 = pd.DataFrame({"A":[1, 2, 3, 4],
"B":[5, 8, 4, 3],
"C":[10, 4, 9, 3]})
df2 = pd.DataFrame({"A":[2, 4, 6, 8],
"B":[-5, -8, -4, -3],
"C":[4, 3, 8, 5]})
df1.corrwith(df2, axis=0)
A 1.000000
B -1.000000
C 0.395437
dtype: float64
So you can see that [1, 2, 3, 4] and [2, 4, 6, 8] have correlation 1.0
The next column [5, 8, 4, 3] and [-5, -8, -4, -3] have extreme negative correlation -1.0
In the last column, [10, 4, 9, 3] and [4, 3, 8, 5] are somewhat correlated 0.395437, because both exhibits high-low-high-low sequence but with varying vertical scaling factors.
So in your case both books 'The Lord of the Rings' and 'The Silmarillion' only has 2 ratings each, and both ratings are having high-low sequence. Even if I illustrate with more data points, they have the same vertical scaling factor.
df1 = pd.DataFrame({"A": [10, 3, 10, 3, 10, 3],
"B": [10, 3, 10, 3, 10, 3]})
df2 = pd.DataFrame({"A": [10, 9, 10, 9, 10, 9],
"B": [10, 10, 10, 9, 9, 9]})
df1.corrwith(df2, axis=0)
A 1.000000
B 0.333333
dtype: float64
So you can see that [10, 3, 10, 3, 10, 3] and [10, 9, 10, 9, 10, 9] are also correlated perfectly at 1.0.
But if I rearrange the sequence a little, [10, 3, 10, 3, 10, 3] and [10, 10, 10, 9, 9, 9] are not perfectly correlated anymore at 0.333333
So going forward, you need more data, and more variations in the data! Hope that helps 😎

Numpy get column of two dimensional matrix as array

I have a matrix that looks like that:
>> X
>>
[[5.1 1.4]
[4.9 1.4]
[4.7 1.3]
[4.6 1.5]
[5. 1.4]]
I want to get its first column as an array of [5.1, 4.9, 4.7, 4.6, 5.]
However when I try to get it by X[:,0] i get
>> [[5.1]
[4.9]
[4.7]
[4.6]
[5. ]]
which is something different. How to get it as an array ?
You can use list comprehensions for this kind of thing..
import numpy as np
X = np.array([[5.1, 1.4], [4.9, 1.4], [4.7, 1.3], [4.6, 1.5], [5.0, 1.4]])
X_0 = [i for i in X[:,0]]
print(X_0)
Output..
[5.1, 4.9, 4.7, 4.6, 5.0]
Almost there! Just reshape your result:
X[:,0].reshape(1,-1)
Outputs:
[[5.1 4.9 4.7 4.6 5. ]]
Full code:
import numpy as np
X=np.array([[5.1 ,1.4],[4.9 ,1.4], [4.7 ,1.3], [4.6 ,1.5], [5. , 1.4]])
print(X)
print(X[:,0].reshape(1,-1))
With regular numpy array:
In [3]: x = np.arange(15).reshape(5,3)
In [4]: x
Out[4]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [5]: x[:,0]
Out[5]: array([ 0, 3, 6, 9, 12])
With np.matrix (use discouraged if not actually deprecated)
In [6]: X = np.matrix(x)
In [7]: X
Out[7]:
matrix([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [8]: print(X)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
In [9]: X[:,0]
Out[9]:
matrix([[ 0],
[ 3],
[ 6],
[ 9],
[12]])
In [10]: X[:,0].T
Out[10]: matrix([[ 0, 3, 6, 9, 12]])
To get 1d array, convert to array and ravel, or in one step:
In [11]: X[:,0].A1
Out[11]: array([ 0, 3, 6, 9, 12])

Calculate statistics of one numpy array based on the values in a second numpy array

Lets say I have a 2-d numpy array
a = np.array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]]
and a 3-d numpy array like
b = np.array([[[1, 2, 8, 8],
[3, 4, 8, 8],
[8, 7, 0, 1],
[6, 5, 3, 2]],
[[1, 1, 1, 3],
[1, 1, 4, 2],
[0, 3, 2, 1],
[3, 2, 3, 9]]])
I want to calculate the statistics (mean, median, majority, sum, count,...) of b according to the "IDs" in a.
Example: sum should result in another array (or a list if that is easier), that gives the sum of the values in b. There are 4 unique "IDs" in a: 1,2,3,4, and 2 'layers' in b. For the 1's in a that is a sum of 10 (layer 0) and 4 (layer 1). For the 2's
it's 32 (layer 0) and 10 (layer 1), and so on...
Expected result for sum:
sums = [[1, 10, 4],
[2, 32, 10],
[3, 26, 8],
[4, 6, 15]]
Expected result for mean:
avgs = [[1, 2.5, 1.0 ],
[2, 8.0, 2.5 ],
[3, 6.5, 2.0 ],
[4, 1.5, 3.75]]
My guess, is that there is a handy function in numpy that does that already, but I am not sure what to search for exactly. Any pointers of how to do it, or what to search for, are much appreciated.
Update:
I came up with this for-loop, which is fine for very small arrays. However, my arrays are much larger than 4 by 4 and a faster impementation is needed.
result = []
ids = np.unique(a)
for id in ids:
line = [id]
for band in range(0, b.shape[0]):
cell = b[band][np.where(a == id)]
line.append(cell.mean())
# line.append(cell.min())
# line.append(cell.max())
# line.append(cell.std())
line.append(cell.sum())
line.append(np.median(cell))
result.append(line)
You can try the code below
cal_sums = [[b[j, :, :][np.argwhere(a==i)[:,0],np.argwhere(a==i)[:,1]].sum()
for i in np.unique(a)] for j in range(2)]
cal_mean = [[b[j, :, :][np.argwhere(a==i)[:,0],np.argwhere(a==i)[:,1]].mean()
for i in np.unique(a)] for j in range(2)]
sums = np.zeros((np.unique(a).size, b.shape[0]+1))
means = np.zeros((np.unique(a).size, b.shape[0]+1))
sums[:, 0] , sums[:,1:] = np.unique(a), np.asarray(cal_sums).T
means[:, 0] , means[:,1:] = np.unique(a), np.asarray(cal_mean).T
print(sums)
[[ 1. 10. 4.]
[ 2. 32. 10.]
[ 3. 26. 8.]
[ 4. 6. 15.]]
print(means)
[[1. 2.5 1. ]
[2. 8. 2.5 ]
[3. 6.5 2. ]
[4. 1.5 3.75]]
I tested it in quite large array size and it is fast
n = 1000
a = np.random.randint(1, 5, size=(n, n))
b = np.random.randint(1, 10, size=(2, n, n))
speed:
377 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

how to exchange position of array terms use numpy in python?

a = np.arange(12).reshape(2,3,2)
[[[ 0 1]
[ 2 3]
[ 4 5]]
[[ 6 7]
[ 8 9]
[10 11]]]
how to exchange position of [4 5] and [10 11] use numpy? Thanks
Those rows can be sliced with:
In [1418]: a[:,2,:]
Out[1418]:
array([[ 4, 5],
[10, 11]])
viewed in reverse order with:
In [1419]: a[::-1,2,:]
Out[1419]:
array([[10, 11],
[ 4, 5]])
and replaced with:
In [1420]: a[:,2,:] = a[::-1,2,:]
In [1421]: a
Out[1421]:
array([[[ 0, 1],
[ 2, 3],
[10, 11]],
[[ 6, 7],
[ 8, 9],
[ 4, 5]]])

numpy custom array element retrieval

I have a question regarding how to extract certain values from a 2D numpy array
Foo =
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
Bar =
array([[0, 0, 1],
[1, 2, 3]])
I want to extract elements from Foo using the values of Bar as indices, such that I end up with an 2D matrix/array Baz of the same shape as Bar. The ith column in Baz correspond is Foo[(np.array(each j in Bar[:,i]),np.array(i,i,i,i ...))]
Baz =
array([[ 1, 2, 6],
[ 4, 8, 12]])
I could do a couple nested for-loops but I was wondering if there is a more elegant, numpy-ish way to do this.
Sorry if this is a bit convoluted. Let me know if I need to explain further.
Thanks!
You can use Bar as the row index and an array [0, 1, 2] as the column index:
# for easy copy-pasting
import numpy as np
Foo = np.array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]])
Bar = np.array([[0, 0, 1], [1, 2, 3]])
# now use Bar as the `i` coordinate and 0, 1, 2 as the `j` coordinate:
Foo[Bar, [0, 1, 2]]
# array([[ 1, 2, 6],
# [ 4, 8, 12]])
# OR, to automatically generate the [0, 1, 2]
Foo[Bar, xrange(Bar.shape[1])]