Multiply dataframe column by dictionary when match - pandas

I have a dataframe
Brand Value
0 A 2
1 B 5
2 C 6
3 D 1
4 E 4
5 F 6
and a dictionary
dic={C:10}
I want to multiply value by the dictionary value, when there is a match on brand and the key.
So the output is
Brand Value
0 A 2
1 B 5
2 C 60
3 D 1
4 E 4
5 F 6

Use:
df['Value'] = df['Value'].mul(df['Brand'].map(dic)).fillna(df['Value'])
# print(df)
Brand Value
0 A 2.0
1 B 5.0
2 C 60.0
3 D 1.0
4 E 4.0
5 F 6.0

You can do a map with fillna:
df['Value'] *= df['Brand'].map(dic).fillna(1)
Output:
Brand Value
0 A 2.0
1 B 5.0
2 C 60.0
3 D 1.0
4 E 4.0
5 F 6.0

Related

Pandas: new column where value is based on a specific value within subgroup

I have a dataframe where I want to create a new column ("NewValue") where it will take the value from the "Group" with Subgroup = A.
Group SubGroup Value NewValue
0 1 A 1 1
1 1 B 2 1
2 2 A 3 3
3 2 C 4 3
4 3 B 5 NaN
5 3 C 6 NaN
Can this be achieved using a groupby / transform function?
Use Series.map with filtered DataFrame in boolean indexing:
df['NewValue'] = df['Group'].map(df[df.SubGroup.eq('A')].set_index('Group')['Value'])
print (df)
Group SubGroup Value NewValue
0 1 A 1 1.0
1 1 B 2 1.0
2 2 A 3 3.0
3 2 C 4 3.0
4 3 B 5 NaN
5 3 C 6 NaN
Alternative with left join in DataFrame.merge with rename column:
df1 = df.loc[df.SubGroup.eq('A'),['Group','Value']].rename(columns={'Value':'NewValue'})
df = df.merge(df1, how='left')
print (df)
Group SubGroup Value NewValue
0 1 A 1 1.0
1 1 B 2 1.0
2 2 A 3 3.0
3 2 C 4 3.0
4 3 B 5 NaN
5 3 C 6 NaN

Pandas Merge(): Appending data from merged columns and replace null values (Extension from question https://stackoverflow.com/questions/68471939)

I'd like to merge two tables while replacing the null value in one column from one table with the non-null values from the same labelled column from another table.
The code below is an example of the tables to be merged:
# Table 1 (has rows with missing values)
a=['x','x','x','y','y','y']
b=['z', 'z', 'z' ,'w', 'w' ,'w' ]
c=[1 for x in a]
d=[2 for x in a]
e=[3 for x in a]
f=[4 for x in a]
g=[1,1,1,np.nan, np.nan, np.nan]
table_1=pd.DataFrame({'a':a, 'b':b, 'c':c, 'd':d, 'e':e, 'f':f, 'g':g})
table_1
a b c d e f g
0 x z 1 2 3 4 1.0
1 x z 1 2 3 4 1.0
2 x z 1 2 3 4 1.0
3 y w 1 2 3 4 NaN
4 y w 1 2 3 4 NaN
5 y w 1 2 3 4 NaN
# Table 2 (new table to be merged to table_1, and would need to use values in column 'c' to replace values in the same column in table_1, while keeping the values in the other non-null rows)
a=['y', 'y', 'y']
b=['w', 'w', 'w']
g=[2,2,2]
table_2=pd.DataFrame({'a':a, 'b':b, 'g':g})
table_2
a b g
0 y w 2
1 y w 2
2 y w 2
This is the code I use for merging the 2 tables, and the ouput I get
merged_table=pd.merge(table_1, table_2, on=['a', 'b'], how='left')
merged_table
Current output:
a b c d e f g_x g_y
0 x z 1 2 3 4 1.0 NaN
1 x z 1 2 3 4 1.0 NaN
2 x z 1 2 3 4 1.0 NaN
3 y w 1 2 3 4 NaN 2.0
4 y w 1 2 3 4 NaN 2.0
5 y w 1 2 3 4 NaN 2.0
6 y w 1 2 3 4 NaN 2.0
7 y w 1 2 3 4 NaN 2.0
8 y w 1 2 3 4 NaN 2.0
9 y w 1 2 3 4 NaN 2.0
10 y w 1 2 3 4 NaN 2.0
11 y w 1 2 3 4 NaN 2.0
Desired output:
a b c d e f g
0 x z 1 2 3 4 1.0
1 x z 1 2 3 4 1.0
2 x z 1 2 3 4 1.0
3 y w 1 2 3 4 2.0
4 y w 1 2 3 4 2.0
5 y w 1 2 3 4 2.0
There are some problems you have to solve:
Tables 1,2 'g' column type: it should be float. So we use DataFrame.astype({'column_name':'type'}) for both tables 1,2;
Indexes. You are allowed to insert data by index, because other columns of table_1 contain the same data : 'y w 1 2 3 4'. Therefore we should filter NaN values from 'g' column of the table 1: ind=table_1[*pd.isnull*(table_1['g'])] and create a new Series with new indexes from table 1 that cover NaN values from 'g': pd.Series(table_2['g'].to_list(),index=ind.index)
try this solution:
table_1=table_1.astype({'a':'str','b':'str','g':'float'})
table_2=table_2.astype({'a':'str','b':'str','g':'float'})
ind=table_1[pd.isnull(table_1['g'])]
table_1.loc[ind.index,'g']=pd.Series(table_2['g'].to_list(),index=ind.index)
Here is the output.

Compute lagged means per name and round in pandas

I need to compute lagged means per groups in my dataframe. This is how my df looks like:
name value round
0 a 5 3
1 b 4 3
2 c 3 2
3 d 1 2
4 a 2 1
5 c 1 1
0 c 1 3
1 d 4 3
2 b 3 2
3 a 1 2
4 b 5 1
5 d 2 1
I would like to compute lagged means for column value per name and round. That is, for name a in round 3 I need to have value_mean = 1.5 (because (1+2)/2). And of course, there will be nan values when round = 1.
I tried this:
df['value_mean'] = df.groupby('name').expanding().mean().groupby('name').shift(1)['value'].values
but it gives a nonsense:
name value round value_mean
0 a 5 3 NaN
1 b 4 3 5.0
2 c 3 2 3.5
3 d 1 2 NaN
4 a 2 1 4.0
5 c 1 1 3.5
0 c 1 3 NaN
1 d 4 3 3.0
2 b 3 2 2.0
3 a 1 2 NaN
4 b 5 1 1.0
5 d 2 1 2.5
Any idea, how can I do this, please? I found this, but it seems not relevant for my problem: Calculate the mean value using two columns in pandas
You can do that as follows
# sort the values as they need to be counted
df.sort_values(['name', 'round'], inplace=True)
df.reset_index(drop=True, inplace=True)
# create a grouper to calculate the running count
# and running sum as the basis of the average
grouper= df.groupby('name')
ser_sum= grouper['value'].cumsum()
ser_count= grouper['value'].cumcount()+1
ser_mean= ser_sum.div(ser_count)
ser_same_name= df['name'] == df['name'].shift(1)
# finally you just have to set the first entry
# in each name-group to NaN (this usually would
# set the entries for each name and round=1 to NaN)
df['value_mean']= ser_mean.shift(1).where(ser_same_name, np.NaN)
# if you want to see the intermediate products,
# you can uncomment the following lines
#df['sum']= ser_sum
#df['count']= ser_count
df
Output:
name value round value_mean
0 a 2 1 NaN
1 a 1 2 2.0
2 a 5 3 1.5
3 b 5 1 NaN
4 b 3 2 5.0
5 b 4 3 4.0
6 c 1 1 NaN
7 c 3 2 1.0
8 c 1 3 2.0
9 d 2 1 NaN
10 d 1 2 2.0
11 d 4 3 1.5

transform dataframe columns and keep orginals

If I have a dataframe with 3 columns A, B ,C
Is it possible to get the rank for example of these columns but keeping the original.
so df
A B C
3 4 2
1 2 3
df.add_suffix('_Rank')=df.rank(axis=1)
A B C A_Rank B_Rank C_Rank
3 4 2 2 3 1
1 2 3 1 2 3
Use join with add_suffix in right side:
df = df.join(df.rank(axis=1).add_suffix('_Rank'))
Or add _Rank to columns names for new columns:
df[df.columns + '_Rank'] = df.rank(axis=1)
print (df)
A B C A_Rank B_Rank C_Rank
0 3 4 2 2.0 3.0 1.0
1 1 2 3 1.0 2.0 3.0

How to remove duplicates the row wherefore two columns matches?

for my graduation project, I would like to remove duplicate rows and keep only a row where column b and c are equal for the value in column a. I tried a lot of things, groupby, Merge combinations and duplicates, but nothing worked out till now. Can you please help me? Many thanks!
input:
a b c
0 1 A B
1 1 A A
2 1 A C
3 2 B A
4 2 B B
result:
a b c
1 1 A A
4 2 B B
I believe you need:
print (df)
a b c
0 1 A B
1 1 A A
2 1 A C
3 2 B A
4 2 B B
5 3 C C
6 4 C NaN
7 4 C E
7 5 NaN E
Replace NaNs by forward and back filling:
df1 = df[['b','c']].bfill(axis=1).ffill(axis=1)
print (df1)
b c
0 A B
1 A A
2 A C
3 B A
4 B B
5 C C
6 C C
7 C E
7 E E
Check condition in df1 and because same index is possible filter df:
df = df[df1['b'] == df1['c']]
print (df)
a b c
1 1 A A
4 2 B B
5 3 C C
6 4 C NaN
7 5 NaN E