At the replication of a dataframe using concat with index (see example here), is there a way I can assign a count variable for each iteration in column c (where column c is the count variable)?
Orig df:
a
b
0
1
2
1
2
3
df replicated with pd.concat[df]*5 and with an additional Column c:
a
b
c
0
1
2
1
1
2
3
1
0
1
2
2
1
2
3
2
0
1
2
3
1
2
3
3
0
1
2
4
1
2
3
4
0
1
2
5
1
2
3
5
This is a multi-row dataframe where the count variable would have to be applied to multiple rows.
Thanks for your thoughts!
You could use np.arange and np.repeat:
N = 5
new_df = pd.concat([df] * N)
new_df['c'] = np.repeat(np.arange(N), df.shape[0]) + 1
Output:
>>> new_df
a b c
0 1 2 1
1 2 3 1
0 1 2 2
1 2 3 2
0 1 2 3
1 2 3 3
0 1 2 4
1 2 3 4
0 1 2 5
1 2 3 5
For the following dataframe:
import pandas as pd
df=pd.DataFrame({'list_A':[3,3,3,3,3,\
2,2,2,2,2,2,2,4,4,4,4,4,4,4,4,4,4,4,4]})
How can 'list_A' be manipulated to give 'list_B'?
Desired output:
list_A
list_B
0
3
1
1
3
1
2
3
1
3
3
0
4
2
1
5
2
1
6
2
0
7
2
0
8
4
1
9
4
1
10
4
1
11
4
1
12
4
0
13
4
0
14
4
0
15
4
0
16
4
0
As you can see, if List_A has the number 3 - then the first 3 values of List_B are '1' and then the value of List_B changes to '0', until List_A changes value again.
GroupBy.cumcount
df['list_B'] = df['list_A'].gt(df.groupby('list_A').cumcount()).astype(int)
print(df)
Output
list_A list_B
0 3 1
1 3 1
2 3 1
3 3 0
4 3 0
5 2 1
6 2 1
7 2 0
8 2 0
9 2 0
10 2 0
11 2 0
12 4 1
13 4 1
14 4 1
15 4 1
16 4 0
17 4 0
18 4 0
19 4 0
20 4 0
21 4 0
22 4 0
23 4 0
EDIT
blocks = df['list_A'].ne(df['list_A'].shift()).cumsum()
df['list_B'] = df['list_A'].gt(df.groupby(blocks).cumcount()).astype(int)
I’ve a pd df consists three columns: ID, t, and ind1.
import pandas as pd
dat = {'ID': [1,1,1,1,2,2,2,3,3,3,3,4,4,4,5,5,6,6,6],
't': [0,1,2,3,0,1,2,0,1,2,3,0,1,2,0,1,0,1,2],
'ind1' : [1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0]
}
df = pd.DataFrame(dat, columns = ['ID', 't', 'ind1'])
print (df)
What I need to do is to create a new column (res) that
for all ID with ind1==0, then res is zero.
for all ID with
ind1==1 and if t==max(t) (group by ID), then res = 1, otherwise zero.
Here’s anticipated output
Check with groupby with idxmax , then where with transform all
df['res']=df.groupby('ID').t.transform('idxmax').where(df.groupby('ID').ind1.transform('all')).eq(df.index).astype(int)
df
Out[160]:
ID t ind1 res
0 1 0 1 0
1 1 1 1 0
2 1 2 1 0
3 1 3 1 1
4 2 0 0 0
5 2 1 0 0
6 2 2 0 0
7 3 0 0 0
8 3 1 0 0
9 3 2 0 0
10 3 3 0 0
11 4 0 1 0
12 4 1 1 0
13 4 2 1 1
14 5 0 1 0
15 5 1 1 1
16 6 0 0 0
17 6 1 0 0
18 6 2 0 0
This works on the knowledge that the ID column is sorted :
cond1 = df.ind1.eq(0)
cond2 = df.ind1.eq(1) & (df.t.eq(df.groupby("ID").t.transform("max")))
df["res"] = np.select([cond1, cond2], [0, 1], 0)
df
ID t ind1 res
0 1 0 1 0
1 1 1 1 0
2 1 2 1 0
3 1 3 1 1
4 2 0 0 0
5 2 1 0 0
6 2 2 0 0
7 3 0 0 0
8 3 1 0 0
9 3 2 0 0
10 3 3 0 0
11 4 0 1 0
12 4 1 1 0
13 4 2 1 1
14 5 0 1 0
15 5 1 1 1
16 6 0 0 0
17 6 1 0 0
18 6 2 0 0
Use groupby.apply:
df['res'] = (df.groupby('ID').apply(lambda x: x['ind1'].eq(1)&x['t'].eq(x['t'].max()))
.astype(int).reset_index(drop=True))
print(df)
ID t ind1 res
0 1 0 1 0
1 1 1 1 0
2 1 2 1 0
3 1 3 1 1
4 2 0 0 0
5 2 1 0 0
6 2 2 0 0
7 3 0 0 0
8 3 1 0 0
9 3 2 0 0
10 3 3 0 0
11 4 0 1 0
12 4 1 1 0
13 4 2 1 1
14 5 0 1 0
15 5 1 1 1
16 6 0 0 0
17 6 1 0 0
18 6 2 0 0
Suppose I have this data frame and I want to aggregate and sum values on column 'a' based on the labels that have the same amount.
a label
0 1 0
1 3 0
2 5 0
3 2 1
4 2 1
5 2 1
6 3 0
7 3 0
8 4 1
The desired result will be:
a label
0 9 0
1 6 1
2 6 0
3 4 1
and not this:
a label
0 15 0
1 10 1
IIUC
s=df.groupby(df.label.diff().ne(0).cumsum()).agg({'a':'sum','label':'first'})
s
Out[280]:
a label
label
1 9 0
2 6 1
3 6 0
4 4 1
I've generated a list of combination and would like to turn it into "dummies" matrix
import pandas as pd
from itertools import combinations
comb = pd.DataFrame(list(combinations(range(1, 6), 4)))
0 1 2 3
0 1 2 3 4
1 1 2 3 5
2 1 2 4 5
3 1 3 4 5
4 2 3 4 5
would like to turn the above dataframe to a dataframe look like below. Thanks.
1 2 3 4 5
0 1 1 1 1 0
1 1 1 1 0 1
2 1 1 0 1 1
3 1 0 1 1 1
4 0 1 1 1 1
You can use MultiLabelBinarizer:
from sklearn.preprocessing import MultiLabelBinarizer
lb = MultiLabelBinarizer()
df = pd.DataFrame(lb.fit_transform(comb.values), columns= lb.classes_)
print (df)
1 2 3 4 5
0 1 1 1 1 0
1 1 1 1 0 1
2 1 1 0 1 1
3 1 0 1 1 1
4 0 1 1 1 1