I have a dataset in this form:
crawled_fag tech_flag ug_flag
0 1 2 0
1 6 0 0
2 2 0 1
3 1 0 1
4 0 1 0
5 0 7 0
What I want here is that the second row should be equal to sum of all it's below rows.. For example, In crawled_flag column, the second row value should be 6+2+1+0+0 = 9...
Simillarly, this should be my final dataset:
crawled_fag tech_flag ug_flag
0 1 2 0
1 9 8 2
Can someone please help me on how to achieve it..
Use concat for add summed all rows without first, last transpose DataFrame:
df = pd.concat([df.iloc[0], df.iloc[1:].sum()], axis=1, ignore_index=True).T
print (df)
crawled_fag tech_flag ug_flag
0 1 2 0
1 9 8 2
Related
This is my dataframe:
0 1 0 1 1
1 0 1 0 1
I generate the sum for each column as below:
data.iloc[:,1:] = data.iloc[:,1:].sum(axis=0)
The result is:
0 1 1 1 2
1 1 1 1 2
But I only want to update values that are not zero:
0 1 0 1 2
1 0 1 0 2
As it is a large dataframe and I don't know which columns will contain zero, I am having trouble in getting the condition to work togther with the iloc
Assuming the following input:
0 1 2 3 4
0 0 1 0 1 1
1 1 0 1 0 1
you can use the underlying numpy array and numpy.where:
import numpy as np
a = data.values[:, 1:]
data.iloc[:,1:] = np.where(a!=0, a.sum(0), a)
output:
0 1 2 3 4
0 0 1 0 1 2
1 1 0 1 0 2
I have a dataframe with sales orders and I want to get the count of the orderlines per order in every row:
Order Orderline
1 0
1 1
1 2
2 0
3 0
3 1
What I would like to obtain is
Order Orderline Count
1 0 3
1 1 3
1 2 3
2 0 1
3 0 2
3 1 2
I tried using transform('count') as I noticed it being used in How to add a new column and fill it up with a specific value depending on another column's series? but that didn't work out. It flattened down my table instead.
Any ideas on how to accomplish this?
Just groupby and then transform:
df['Count'] = df.groupby(by=['Order'], as_index=False).transform('count')
print(df)
Order Orderline Count
0 1 0 3
1 1 1 3
2 1 2 3
3 2 0 1
4 3 0 2
5 3 1 2
Try this
df['counts'] = df.apply(lambda x: (df['Order'] == x['Order']).sum(), axis=1)
I'm trying to find intersection of A, B, C for all possible A,B,C column vectors in the following df.
dataframe
print(df)
0 1 2 3
0 1 0 0 0
1 0 0 1 1
2 1 0 1 1
3 0 1 0 0
4 1 0 1 0
df.T.dot(df)
gives pairwise intersection counts of column vectors
dot product
0 1 2 3
0 3 0 2 1
1 0 1 0 0
2 2 0 3 2
3 1 0 2 2
How do I get to intersection triples among column vectors
Eg: col 0, col 2, col 3 has value 1*1*1 = 1.
I'm trying to make a three dimensional association matrix for item-item-item similarity. What is the best approach here?
I have a dataframe like the follows.
>>> data
target user data
0 A 1 0
1 A 1 0
2 A 1 1
3 A 2 0
4 A 2 1
5 B 1 1
6 B 1 1
7 B 1 0
8 B 2 0
9 B 2 0
10 B 2 1
You can see that each user may contribute multiple claims about a target. I want to only store each user's most frequent data for each target. For example, for the dataframe shown above, I want the result like follows.
>>> result
target user data
0 A 1 0
1 A 2 0
2 B 1 1
3 B 2 0
How to do this? And, can I do this using groupby? (my real dataframe is not sorted)
Thanks!
Using groupby with count create the helper key , then we using idxmax
df['helperkey']=df.groupby(['target','user','data']).data.transform('count')
df.groupby(['target','user']).helperkey.idxmax()
Out[10]:
target user
A 1 0
2 3
B 1 5
2 8
Name: helperkey, dtype: int64
df.loc[df.groupby(['target','user']).helperkey.idxmax()]
Out[11]:
target user data helperkey
0 A 1 0 2
3 A 2 0 1
5 B 1 1 2
8 B 2 0 2
I have in pandas by using of groupby() next output (A,B,C are the columns in the input table)
C
A B
0 0 6
2 1
6 5
. . .
Output details: [244 rows x 1 columns] I just want to have all 3 columns instead of one,how is it possible to do?
Output, which I wish:
A B C
0 0 6
0 2 1
. . .
It appears to be undocumented, but simply: gb.bfill(), see this example:
In [68]:
df=pd.DataFrame({'A':[0,0,0,0,0,0,0,0],
'B':[0,0,0,0,1,1,1,1],
'C':[1,2,3,4,1,2,3,4],})
In [69]:
gb=df.groupby(['A', 'B'])
In [70]:
print gb.bfill()
A B C
0 0 0 1
1 0 0 2
2 0 0 3
3 0 0 4
4 0 1 1
5 0 1 2
6 0 1 3
7 0 1 4
[8 rows x 3 columns]
But I don't see why you need to do that, don't you end up with the original DataFrame (only maybe rearranged)?