How to create a rolling unique count by group using pandas

How to create a rolling unique count by group using pandas - pandas

I have a dataframe like the following:
group value
1 a
1 a
1 b
1 b
1 b
1 b
1 c
2 d
2 d
2 d
2 d
2 e
I want to create a column with how many unique values there have been so far for the group. Like below:
group value group_value_id
1 a 1
1 a 1
1 b 2
1 b 2
1 b 2
1 b 2
1 c 3
2 d 1
2 d 1
2 d 1
2 d 1
2 e 2

Use custom lambda function with GroupBy.transform and factorize:
df['group_value_id']=df.groupby('group')['value'].transform(lambda x:pd.factorize(x)[0]) + 1
print (df)
group value group_value_id
0 1 a 1
1 1 a 1
2 1 b 2
3 1 b 2
4 1 b 2
5 1 b 2
6 1 c 3
7 2 d 1
8 2 d 1
9 2 d 1
10 2 d 1
11 2 e 2
because:
df['group_value_id'] = df.groupby('group')['value'].rank('dense')
print (df)
DataError: No numeric types to aggregate

Also cab be solved as :
df['group_val_id'] = (df.groupby('group')['value'].
apply(lambda x:x.astype('category').cat.codes + 1))
df
group value group_val_id
0 1 a 1
1 1 a 1
2 1 b 2
3 1 b 2
4 1 b 2
5 1 b 2
6 1 c 3
7 2 d 1
8 2 d 1
9 2 d 1
10 2 d 1
11 2 e 2

Related

Reorder pandas DataFrame based on repetitive set of integer in index

I have a pandas dataframe contains some columns, I didn't find a way to order rows as follows:
I need to order the dataframe by the field label but in sequential order (like groups)
Input
I category tags
1 A #25-74
1 B #26-170
0 C #29-106
2 A #18-109
3 B #26-86
2 A #26-108
2 C #30-125
1 B #28-145
0 B #29-93
0 D #21-102
1 F #26-108
2 F #30-125
3 A #28-145
3 D #29-93
0 B #21-102
Needed Order:
I category tags
0 C #29-106
1 B #25-74
2 F #18-109
3 C #26-86
0 B #29-93
1 D #26-170
2 B #26-108
3 B #28-145
0 C #21-102
1 D #28-145
2 A #30-125
3 A #29-93
0 B #21-102
1 A #26-108
2 C #30-125
I have searched for different ways to sort but couldn't find a way to sort using only pandas.
I appreciate every help!

One idea with helper column by GroupBy.cumcount and DataFrame.sort_values:
df['a'] = df.groupby('I').cumcount()
df = df.sort_values(['a','I'])
print (df)
I category tags a
2 0 C #29-106 0
0 1 A #25-74 0
3 2 A #18-109 0
4 3 B #26-86 0
8 0 B #29-93 1
1 1 B #26-170 1
5 2 A #26-108 1
12 3 A #28-145 1
9 0 D #21-102 2
7 1 B #28-145 2
6 2 C #30-125 2
13 3 D #29-93 2
14 0 B #21-102 3
10 1 F #26-108 3
11 2 F #30-125 3
Or first sorting by column | and then change order with Series.argsort and DataFrame.iloc:
df = df.sort_values('I')
df = df.iloc[df.groupby('I').cumcount().argsort()]
print (df)
I category tags
2 0 C #29-106
0 1 A #25-74
3 2 A #18-109
4 3 B #26-86
8 0 B #29-93
1 1 B #26-170
5 2 A #26-108
12 3 A #28-145
9 0 D #21-102
7 1 B #28-145
6 2 C #30-125
13 3 D #29-93
14 0 B #21-102
10 1 F #26-108
11 2 F #30-125

Pandas - want to create new variable based on last occurrence of element in reference variable?

I have a DataFrame:-
col count
0 B 1
1 B 2
2 A 1
3 A 2
4 A 3
5 C 1
6 C 2
7 C 3
8 C 4
wan to create new variable named Flag according to last occurrence of B , A in col variable. reference df:-
col count Flag
0 B 1 0
1 B 2 1
2 A 1 0
3 A 2 0
4 A 3 1
5 C 1 0
6 C 2 0
7 C 3 0
8 C 4 1
TIA

Use Series.duplicated with numpy.where:
df['Flag'] = np.where(df['col'].duplicated(keep='last'), 0, 1)
Or Series.view with invert mask by ~:
df['Flag'] = (~df['col'].duplicated(keep='last')).view('i1')
print (df)
col count Flag
0 B 1 0
1 B 2 1
2 A 1 0
3 A 2 0
4 A 3 1
5 C 1 0
6 C 2 0
7 C 3 0
8 C 4 1

Replace values of duplicated rows with first record in pandas?

Input
df
id label
a 1
b 2
a 3
a 4
b 2
b 3
c 1
c 2
d 2
d 3
Expected
df
id label
a 1
b 2
a 1
a 1
b 2
b 2
c 1
c 1
d 2
d 2
For id a, the label value is 1 and id b is 2 because 1 and 2 is the first record for a and b.
Try
I refer this post, but still not solve it.

Update with transform first
df['lb2']=df.groupby('id').label.transform('first')
df
Out[87]:
id label lb2
0 a 1 1
1 b 2 2
2 a 3 1
3 a 4 1
4 b 2 2
5 b 3 2
6 c 1 1
7 c 2 1
8 d 2 2
9 d 3 2

compare two column of two dataframe pandas

I have 2 data frames like :
df_out:
a b c d
1 1 2 1
2 1 2 3
3 1 3 5
df_fin:
a e f g
1 0 2 1
2 5 2 3
3 1 3 5
5 2 4 6
7 3 2 5
I want to get result as :
a b c d a e f g
1 1 2 1 1 0 2 1
2 1 2 3 2 5 2 3
3 1 3 5 3 1 3 5
in the other word I have two diffrent data frames that are common in one column(a), I want two compare this two columns(df_fin.a and df_out.a) and select the rows from df_fin that have the same value in column a and create new dataframe that has selected rows from df_fin and added columns from df_out ?

I think you need merge with left join:
df = pd.merge(df_out, df_fin, on='a', how='left')
print (df)
a b c d e f g
0 1 1 2 1 0 2 1
1 2 1 2 3 5 2 3
2 3 1 3 5 1 3 5
EDIT:
df1 = df_fin[df_fin['a'].isin(df_out['a'])]
df2 = df_out.join(df1.set_index('a'), on='a')
print (df2)
a b c d e f g
0 1 1 2 1 0 2 1
1 2 1 2 3 5 2 3
2 3 1 3 5 1 3 5

column names to column, pandas

What is an apposite function of pivot in Pandas?
For example I have
a b c
1 1 2
2 2 3
3 1 2
What I want
a newcol newcol2
1 b 1
1 c 2
2 b 2
2 c 3
3 b 1
3 c 2

use pd.melt http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[1,2,1],'c':[2,3,2]})
pd.melt(df,id_vars=['a'])
Out[8]:
a variable value
0 1 b 1
1 2 b 2
2 3 b 1
3 1 c 2
4 2 c 3
5 3 c 2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to create a rolling unique count by group using pandas - pandas

Also cab be solved as : df['group_val_id'] = (df.groupby('group')['value']. apply(lambda x:x.astype('category').cat.codes + 1)) df group value group_val_id 0 1 a 1 1 1 a 1 2 1 b 2 3 1 b 2 4 1 b 2 5 1 b 2 6 1 c 3 7 2 d 1 8 2 d 1 9 2 d 1 10 2 d 1 11 2 e 2

Related

Reorder pandas DataFrame based on repetitive set of integer in index

Pandas - want to create new variable based on last occurrence of element in reference variable?

Replace values of duplicated rows with first record in pandas?

compare two column of two dataframe pandas

column names to column, pandas

Categories

Resources