Pandas filter after aggregation

Pandas filter after aggregation - pandas

Is is possible to filter the data after groupby aggregation ?
I have aggregated the sum after applying groupby function, and want to see the rows where the sum is between some values.
Here is a basic code
A = pd.DataFrame([
[1, 2],
[2, 3],
[1, 6],
[2, 7],
[3, 5],
[2, 9],
[4, 7],
[3, 5],
[3, 9],
[3, 4]
], columns=['id', 'val'])
display(A)
display(A.groupby(['id']).agg({'val': ['sum', 'count']}))
I want count of val between 1 and 4 after aggregation

I dint understand if you wanted the sum between 1 and 4 or the count. So here is how i made it for the two options:
import pandas as pd
A = pd.DataFrame([
[1, 2],
[2, 3],
[1, 6],
[2, 7],
[3, 5],
[2, 9],
[4, 7],
[3, 5],
[3, 9],
[3, 4],
[1,2],
[1,2],
[1,2],
[1,2],
[1,2],
], columns=['id', 'val'])
s = A.groupby(['id']).agg({'val': ['sum', 'count']})
# If you want the count
s[(s['val']['count']<=4) & (s['val']['count']>=1)]
# If you want the sum
s[(s['val']['sum']<=4) & (s['sum']['count']>=1)]

Related

Is there a numpy (or Python) function to correlate each columns of 2D numpy array (n,m)

I have two numpy matrices (6 rows and 3 columns) :
a = np.array([[1,2,4],[3,6,2],[3,4,7],[9,7,7],[6,3,1],[3,5,9]])
b = np.array([[4,5,2],[9,2,5],[1,5,6],[4,5,6],[1,2,6],[6,4,3]])
a = array([[1, 2, 4],
[3, 6, 2],
[3, 4, 7],
[9, 7, 7],
[6, 3, 1],
[3, 5, 9]])
b = array([[4, 5, 2],
[9, 2, 5],
[1, 5, 6],
[4, 5, 6],
[1, 2, 6],
[6, 4, 3]])
I would like to calculate the pearson correlation coefficient between the first column of a and b, the second column of a and b and the third column of a and b.
The result would be a vector of 3 (3 correlation coeff).

One way using numpy.corrcoef and diagonal:
corr = np.corrcoef(a.T, b.T).diagonal(a.shape[1])
corr
Output:
array([-0.2324843 , -0.03631365, -0.18057878])

Delete specified column index from every row of 2d numpy array

I have a numpy array A as follows:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
and another numpy array column_indices_to_be_deleted as follows:
array([1, 0, 2])
I want to delete the element from every row of A specified by the column indices in column_indices_to_be_deleted. So, column index 1 from row 0, column index 0 from row 1 and column index 2 from row 2 in this case, to get a new array that looks like this:
array([[1, 3],
[5, 6],
[7, 8]])
What would be the simplest way of doing that?

One way with masking created with broadcatsed-comparison -
In [43]: a # input array
Out[43]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [44]: remove_idx # indices to be removed from each row
Out[44]: array([1, 0, 2])
In [45]: n = a.shape[1]
In [46]: a[remove_idx[:,None]!=np.arange(n)].reshape(-1,n-1)
Out[46]:
array([[1, 3],
[5, 6],
[7, 8]])
Another mask based approach with the mask created with array-assignment -
In [47]: mask = np.ones(a.shape,dtype=bool)
In [48]: mask[np.arange(len(remove_idx)), remove_idx] = 0
In [49]: a[mask].reshape(-1,a.shape[1]-1)
Out[49]:
array([[1, 3],
[5, 6],
[7, 8]])
Another with np.delete -
In [64]: m,n = a.shape
In [66]: np.delete(a.flat,remove_idx+n*np.arange(m)).reshape(m,-1)
Out[66]:
array([[1, 3],
[5, 6],
[7, 8]])

Random valid data items in numpy array

Suppose I have a numpy array as follows:
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
I would like to randomly select n-valid items from the array, including their indices.
Does numpy provide an efficient way of doing this?

Example
data = np.array([[1, 3, 8, np.nan], [np.nan, 6, 7, 9], [np.nan, 0, 1, 2], [5, np.nan, np.nan, 2]])
n = 5
Get valid indices
y_val, x_val = np.where(~np.isnan(data))
n_val = y_val.size
Pick random subset of size n by index
pick = np.random.choice(n_val, n)
Apply index to valid coordinates
y_pick, x_pick = y_val[pick], x_val[pick]
Get corresponding data
data_pick = data[y_pick, x_pick]
Admire
data_pick
# array([2., 8., 1., 1., 2.])
y_pick
# array([3, 0, 0, 2, 3])
x_pick
# array([3, 2, 0, 2, 3])

Find nonzeros by :
In [37]: a = np.array(np.nonzero(data)).reshape(-1,2)
In [38]: a
Out[38]:
array([[0, 0],
[0, 0],
[1, 1],
[1, 1],
[2, 2],
[2, 3],
[3, 3],
[3, 0],
[1, 2],
[3, 0],
[1, 2],
[3, 0],
[2, 3],
[0, 1],
[2, 3]])
Now pick a random choice :
In [44]: idx = np.random.choice(np.arange(len(a)))
In [45]: data[a[idx][0],a[idx][1]]
Out[45]: 2.0

How to combined two arrays by interating with numpy? [duplicate]

I'd like to turn an open mesh returned by the numpy ix_ routine to a list of coordinates
eg, for:
In[1]: m = np.ix_([0, 2, 4], [1, 3])
In[2]: m
Out[2]:
(array([[0],
[2],
[4]]), array([[1, 3]]))
What I would like is:
([0, 1], [0, 3], [2, 1], [2, 3], [4, 1], [4, 3])
I'm pretty sure I could hack it together with some iterating, unpacking and zipping, but I'm sure there must be a smart numpy way of achieving this...

Approach #1 Use np.meshgrid and then stack -
r,c = np.meshgrid(*m)
out = np.column_stack((r.ravel('F'), c.ravel('F') ))
Approach #2 Alternatively, with np.array() and then transposing, reshaping -
np.array(np.meshgrid(*m)).T.reshape(-1,len(m))
For a generic case with for generic number of arrays used within np.ix_, here are the modifications needed -
p = np.r_[2:0:-1,3:len(m)+1,0]
out = np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Sample runs -
Two arrays case :
In [376]: m = np.ix_([0, 2, 4], [1, 3])
In [377]: p = np.r_[2:0:-1,3:len(m)+1,0]
In [378]: np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Out[378]:
array([[0, 1],
[0, 3],
[2, 1],
[2, 3],
[4, 1],
[4, 3]])
Three arrays case :
In [379]: m = np.ix_([0, 2, 4], [1, 3],[6,5,9])
In [380]: p = np.r_[2:0:-1,3:len(m)+1,0]
In [381]: np.array(np.meshgrid(*m)).transpose(p).reshape(-1,len(m))
Out[381]:
array([[0, 1, 6],
[0, 1, 5],
[0, 1, 9],
[0, 3, 6],
[0, 3, 5],
[0, 3, 9],
[2, 1, 6],
[2, 1, 5],
[2, 1, 9],
[2, 3, 6],
[2, 3, 5],
[2, 3, 9],
[4, 1, 6],
[4, 1, 5],
[4, 1, 9],
[4, 3, 6],
[4, 3, 5],
[4, 3, 9]])

NumPy mgrid into tuples

How the result can be obtained from given NumPy array (xx and yy)?
>>> xx, yy = np.mgrid[0:2, 5:7]
>>> xx
array([[0, 0],
[1, 1]])
>>> yy
array([[5, 6],
[5, 6]])
>>> result = [(0,5), (1,5), (1,6), (0,6)]
>>> result
[(0, 5), (1, 5), (1, 6), (0, 6)]
>>>

The order in your example requires some fancy indexing of xx. I had to reverse the order of the 2nd column.
In [243]: np.array([np.array([xx[:,0], xx[::-1,1]]).flatten(), yy.T.flatten()]).T.tolist()
Out[243]: [[0, 5], [1, 5], [1, 6], [0, 6]]
If the order isn't so important, then we can treat xx just like yy:
In [256]: xx, yy = np.mgrid[0:3, 5:8]
In [257]: np.array([xx.T.flatten(),yy.T.flatten()]).T.tolist()
Out[257]: [[0, 5], [1, 5], [2, 5], [0, 6], [1, 6], [2, 6], [0, 7], [1, 7], [2, 7]]
In [258]: np.array([xx.flatten(),yy.flatten()]).T.tolist()
Out[258]: [[0, 5], [0, 6], [0, 7], [1, 5], [1, 6], [1, 7], [2, 5], [2, 6], [2, 7]]
In [264]: np.array([xx,yy]).reshape(2,-1).T.tolist()
Out[264]: [[0, 5], [0, 6], [0, 7], [1, 5], [1, 6], [1, 7], [2, 5], [2, 6], [2, 7]]
In [272]: np.dstack([xx,yy]).reshape(-1,2).tolist()
Out[272]: [[0, 5], [0, 6], [0, 7], [1, 5], [1, 6], [1, 7], [2, 5], [2, 6], [2, 7]]
In [302]: list(np.broadcast(*np.ogrid[0:3,5:8]))
Out[302]: [(0, 5), (0, 6), (0, 7), (1, 5), (1, 6), (1, 7), (2, 5), (2, 6), (2, 7)]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas filter after aggregation - pandas

Related

Is there a numpy (or Python) function to correlate each columns of 2D numpy array (n,m)

Delete specified column index from every row of 2d numpy array

Random valid data items in numpy array

How to combined two arrays by interating with numpy? [duplicate]

NumPy mgrid into tuples

Categories

Resources