Is is possible to filter the data after groupby aggregation ?
I have aggregated the sum after applying groupby function, and want to see the rows where the sum is between some values.
Here is a basic code
A = pd.DataFrame([
[1, 2],
[2, 3],
[1, 6],
[2, 7],
[3, 5],
[2, 9],
[4, 7],
[3, 5],
[3, 9],
[3, 4]
], columns=['id', 'val'])
display(A)
display(A.groupby(['id']).agg({'val': ['sum', 'count']}))
I want count of val between 1 and 4 after aggregation
I dint understand if you wanted the sum between 1 and 4 or the count. So here is how i made it for the two options:
import pandas as pd
A = pd.DataFrame([
[1, 2],
[2, 3],
[1, 6],
[2, 7],
[3, 5],
[2, 9],
[4, 7],
[3, 5],
[3, 9],
[3, 4],
[1,2],
[1,2],
[1,2],
[1,2],
[1,2],
], columns=['id', 'val'])
s = A.groupby(['id']).agg({'val': ['sum', 'count']})
# If you want the count
s[(s['val']['count']<=4) & (s['val']['count']>=1)]
# If you want the sum
s[(s['val']['sum']<=4) & (s['sum']['count']>=1)]
Related
I have two numpy matrices (6 rows and 3 columns) :
a = np.array([[1,2,4],[3,6,2],[3,4,7],[9,7,7],[6,3,1],[3,5,9]])
b = np.array([[4,5,2],[9,2,5],[1,5,6],[4,5,6],[1,2,6],[6,4,3]])
a = array([[1, 2, 4],
[3, 6, 2],
[3, 4, 7],
[9, 7, 7],
[6, 3, 1],
[3, 5, 9]])
b = array([[4, 5, 2],
[9, 2, 5],
[1, 5, 6],
[4, 5, 6],
[1, 2, 6],
[6, 4, 3]])
I would like to calculate the pearson correlation coefficient between the first column of a and b, the second column of a and b and the third column of a and b.
The result would be a vector of 3 (3 correlation coeff).
One way using numpy.corrcoef and diagonal:
corr = np.corrcoef(a.T, b.T).diagonal(a.shape[1])
corr
Output:
array([-0.2324843 , -0.03631365, -0.18057878])
I have a numpy array A as follows:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
and another numpy array column_indices_to_be_deleted as follows:
array([1, 0, 2])
I want to delete the element from every row of A specified by the column indices in column_indices_to_be_deleted. So, column index 1 from row 0, column index 0 from row 1 and column index 2 from row 2 in this case, to get a new array that looks like this:
array([[1, 3],
[5, 6],
[7, 8]])
What would be the simplest way of doing that?
One way with masking created with broadcatsed-comparison -
In [43]: a # input array
Out[43]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [44]: remove_idx # indices to be removed from each row
Out[44]: array([1, 0, 2])
In [45]: n = a.shape[1]
In [46]: a[remove_idx[:,None]!=np.arange(n)].reshape(-1,n-1)
Out[46]:
array([[1, 3],
[5, 6],
[7, 8]])
Another mask based approach with the mask created with array-assignment -
In [47]: mask = np.ones(a.shape,dtype=bool)
In [48]: mask[np.arange(len(remove_idx)), remove_idx] = 0
In [49]: a[mask].reshape(-1,a.shape[1]-1)
Out[49]:
array([[1, 3],
[5, 6],
[7, 8]])
Another with np.delete -
In [64]: m,n = a.shape
In [66]: np.delete(a.flat,remove_idx+n*np.arange(m)).reshape(m,-1)
Out[66]:
array([[1, 3],
[5, 6],
[7, 8]])
Suppose I have a numpy array as follows:
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
I would like to randomly select n-valid items from the array, including their indices.
Does numpy provide an efficient way of doing this?
Example
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
n = 5
Get valid indices
y_val, x_val = np.where(~np.isnan(data))
n_val = y_val.size
Pick random subset of size n by index
pick = np.random.choice(n_val, n)
Apply index to valid coordinates
y_pick, x_pick = y_val[pick], x_val[pick]
Get corresponding data
data_pick = data[y_pick, x_pick]
Admire
data_pick
# array([2., 8., 1., 1., 2.])
y_pick
# array([3, 0, 0, 2, 3])
x_pick
# array([3, 2, 0, 2, 3])
Find nonzeros by :
In [37]: a = np.array(np.nonzero(data)).reshape(-1,2)
In [38]: a
Out[38]:
array([[0, 0],
[0, 0],
[1, 1],
[1, 1],
[2, 2],
[2, 3],
[3, 3],
[3, 0],
[1, 2],
[3, 0],
[1, 2],
[3, 0],
[2, 3],
[0, 1],
[2, 3]])
Now pick a random choice :
In [44]: idx = np.random.choice(np.arange(len(a)))
In [45]: data[a[idx][0],a[idx][1]]
Out[45]: 2.0
I'd like to turn an open mesh returned by the numpy ix_ routine to a list of coordinates
eg, for:
In[1]: m = np.ix_([0, 2, 4], [1, 3])
In[2]: m
Out[2]:
(array([[0],
[2],
[4]]), array([[1, 3]]))
What I would like is:
([0, 1], [0, 3], [2, 1], [2, 3], [4, 1], [4, 3])
I'm pretty sure I could hack it together with some iterating, unpacking and zipping, but I'm sure there must be a smart numpy way of achieving this...
Approach #1 Use np.meshgrid and then stack -
r,c = np.meshgrid(*m)
out = np.column_stack((r.ravel('F'), c.ravel('F') ))
Approach #2 Alternatively, with np.array() and then transposing, reshaping -
np.array(np.meshgrid(*m)).T.reshape(-1,len(m))
For a generic case with for generic number of arrays used within np.ix_, here are the modifications needed -
p = np.r_[2:0:-1,3:len(m)+1,0]
out = np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Sample runs -
Two arrays case :
In [376]: m = np.ix_([0, 2, 4], [1, 3])
In [377]: p = np.r_[2:0:-1,3:len(m)+1,0]
In [378]: np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Out[378]:
array([[0, 1],
[0, 3],
[2, 1],
[2, 3],
[4, 1],
[4, 3]])
Three arrays case :
In [379]: m = np.ix_([0, 2, 4], [1, 3],[6,5,9])
In [380]: p = np.r_[2:0:-1,3:len(m)+1,0]
In [381]: np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Out[381]:
array([[0, 1, 6],
[0, 1, 5],
[0, 1, 9],
[0, 3, 6],
[0, 3, 5],
[0, 3, 9],
[2, 1, 6],
[2, 1, 5],
[2, 1, 9],
[2, 3, 6],
[2, 3, 5],
[2, 3, 9],
[4, 1, 6],
[4, 1, 5],
[4, 1, 9],
[4, 3, 6],
[4, 3, 5],
[4, 3, 9]])
How the result can be obtained from given NumPy array (xx and yy)?
>>> xx, yy = np.mgrid[0:2, 5:7]
>>> xx
array([[0, 0],
[1, 1]])
>>> yy
array([[5, 6],
[5, 6]])
>>> result = [(0,5), (1,5), (1,6), (0,6)]
>>> result
[(0, 5), (1, 5), (1, 6), (0, 6)]
>>>
The order in your example requires some fancy indexing of xx. I had to reverse the order of the 2nd column.
In [243]: np.array([np.array([xx[:,0], xx[::-1,1]]).flatten(), yy.T.flatten()]).T.tolist()
Out[243]: [[0, 5], [1, 5], [1, 6], [0, 6]]
If the order isn't so important, then we can treat xx just like yy:
In [256]: xx, yy = np.mgrid[0:3, 5:8]
In [257]: np.array([xx.T.flatten(),yy.T.flatten()]).T.tolist()
Out[257]: [[0, 5], [1, 5], [2, 5], [0, 6], [1, 6], [2, 6], [0, 7], [1, 7], [2, 7]]
In [258]: np.array([xx.flatten(),yy.flatten()]).T.tolist()
Out[258]: [[0, 5], [0, 6], [0, 7], [1, 5], [1, 6], [1, 7], [2, 5], [2, 6], [2, 7]]
In [264]: np.array([xx,yy]).reshape(2,-1).T.tolist()
Out[264]: [[0, 5], [0, 6], [0, 7], [1, 5], [1, 6], [1, 7], [2, 5], [2, 6], [2, 7]]
In [272]: np.dstack([xx,yy]).reshape(-1,2).tolist()
Out[272]: [[0, 5], [0, 6], [0, 7], [1, 5], [1, 6], [1, 7], [2, 5], [2, 6], [2, 7]]
In [302]: list(np.broadcast(*np.ogrid[0:3,5:8]))
Out[302]: [(0, 5), (0, 6), (0, 7), (1, 5), (1, 6), (1, 7), (2, 5), (2, 6), (2, 7)]