How do i update a column in pandas based on condition from other data frame.
I have 2 dataframe df1 and df2
import pandas as pd
df1=pd.DataFrame({'names':['andi','andrew','jhon','andreas'],
'salary':[1000,2000,2300,1500]})
df2=pd.DataFrame({'names':['andi','andrew'],
'raise':[1500,2500]})
expected output
names salary
andi 1500
andrew 2500
jhon 2300
andreas 1500
Use Series.combine_first with DataFrame.set_index:
df = (df2.set_index('names')['raise']
.combine_first(df1.set_index('names')['salary'])
.reset_index())
print (df)
names raise
0 andi 1500.0
1 andreas 1500.0
2 andrew 2500.0
3 jhon 2300.0
Using merge & update, similar like sql.
df3 = pd.merge(df1, df2, how = 'left', left_on ='names', right_on = 'names')
df3.loc[df3['raise'].notnull(),'salary'] = df3['raise']
df3
names salary raise
0 andi 1500.0 1500.0
1 andrew 2500.0 2500.0
2 jhon 2300.0 NaN
3 andreas 1500.0 NaN
Related
I have a dataframe, df, with datetimeindex and a single column, like this:
I need to count how many non-zero entries i have at each month. For example, according to those images, in January i would have 2 entries, in February 1 entry and in March 2 entries. I have more months in the dataframe, but i guess that explains the problem.
I tried using pandas groupby:
df.groupby(df.index.month).count()
But that just gives me total days at each month and i don't saw any other parameter in count() that i could use here.
Any ideas?
Try index.to_period()
For example:
In [1]: import pandas as pd
import numpy as np
x_df = pd.DataFrame(
{
'values': np.random.randint(low=0, high=2, size=(120,))
} ,
index = pd.date_range("2022-01-01", periods=120, freq="D")
)
In [2]: x_df
Out[2]:
values
2022-01-01 0
2022-01-02 0
2022-01-03 1
2022-01-04 0
2022-01-05 0
...
2022-04-26 1
2022-04-27 0
2022-04-28 0
2022-04-29 1
2022-04-30 1
[120 rows x 1 columns]
In [3]: x_df[x_df['values'] != 0].groupby(lambda x: x.to_period("M")).count()
Out[3]:
values
2022-01 17
2022-02 15
2022-03 16
2022-04 17
can you try this:
#drop nans
import numpy as np
dfx['col1']=dfx['col1'].replace(0,np.nan)
dfx=dfx.dropna()
dfx=dfx.resample('1M').count()
I have a dataframe with a column object.
How do I append "0" at the end of strings which are of length 4.
Column
12356
1287
23868
5643
45634
You can use str.ljust whick takes as first argument the width that the string should be and the character to be filled with (on right side of string) if the width is less than the specified:
df['col'] = df['col'].str.ljust(width=5, fillchar='0')
print(df)
col
0 12356
1 12870
2 23868
3 56430
4 45634
Data setup:
import pandas as pd
df = pd.DataFrame(
{'col':'12356 1287 23868 5643 45634'.split()}
,dtype=('object'))
col
0 12356
1 1287
2 23868
3 5643
4 45634
One more method can be done using list comprehension
import pandas as pd
df = pd.DataFrame(
{'col': '12356 1287 23868 5643 45634'.split()}
, dtype=('object'))
df['col'] = [i + '0' if len(i) == 4 else i for i in df['col']]
output col:
col
0 12356
1 12870
2 23868
3 56430
4 45634
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I have two dataframes as follows:
df1 =
Index Name Age
0 Bob1 20
1 Bob2 21
2 Bob3 22
The second dataframe is as follows -
df2 =
Index Country Name
0 US Bob1
1 UK Bob123
2 US Bob234
3 Canada Bob2
4 Canada Bob987
5 US Bob3
6 UK Mary1
7 UK Mary2
8 UK Mary3
9 Canada Mary65
I would like to compare the names from df1 to the countries in df2 and create a new dataframe as follows:
Index Country Name Age
0 US Bob1 20
1 Canada Bob2 21
2 US Bob3 22
Thank you.
Using merge() should solve the problem.
df3 = pd.merge(df1, df2, on='Name')
Outcome:
import pandas as pd
df1 = pd.DataFrame({ "Name":["Bob1", "Bob2", "Bob3"], "Age":[20,21,22]})
df2 = pd.DataFrame({ "Country":["US", "UK", "US", "Canada", "Canada", "US", "UK", "UK", "UK", "Canada"],
"Name":["Bob1", "Bob123", "Bob234", "Bob2", "Bob987", "Bob3", "Mary1", "Mary2", "Mary3", "Mary65"]})
df3 = pd.merge(df1, df2, on='Name')
df3
I have made an ad hoc example that you can run, to show you a dataframe similar to df3 that I have to use:
people1 = [['Alex',10],['Bob',12],['Clarke',13],['NaN',],['NaN',],['NaN',]]
people2 = [['NaN',],['NaN',],['NaN',],['Mark',20],['Jane',22],['Jack',23]]
df1 = pd.DataFrame(people1,columns=['Name','Age'])
df2 = pd.DataFrame(people2,columns=['Name','Age'])
people_list=[df1, df2]
df3 = pd.concat((people_list[0]['Name'], people_list[1]['Name']), axis=1)
df3
How would I modify the dataframe df3 to get rid of the NaN values and put the 2 columns one next to the other (I don't care about keeping the id's, I just want a clean dataframe with the 2 columns next to each other) ??
you can drop nan values 1st:
df3 = pd.concat([df1.dropna(), df2.dropna()])
Output:
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0
3 Mark 20.0
4 Jane 22.0
5 Jack 23.0
Or if you want to contact side-by-side:
df3 = pd.concat([df1.dropna().reset_index(drop=True), df2.dropna().reset_index(drop=True)], 1)
output:
Name Age Name Age
0 Alex 10.0 Mark 20.0
1 Bob 12.0 Jane 22.0
2 Clarke 13.0 Jack 23.0
If you just wanna concat the name column side-by-side:
df3 = pd.concat([df1.dropna().reset_index(drop=True)['Name'], df2.dropna().reset_index(drop=True)['Name']], 1)
output:
Name Name
0 Alex Mark
1 Bob Jane
2 Clarke Jack
If you want to modify only df3 it can be done via iloc and dropna:
df3 = pd.concat([df3.iloc[:,0].dropna().reset_index(drop=True) , df3.iloc[:,1].dropna().reset_index(drop=True)],1)
Output:
Name Name
0 Alex Mark
1 Bob Jane
2 Clarke Jack
people1 = [['Alex',10],['Bob',12],['Clarke',13],['NaN',],['NaN',],['NaN',]]
people2 = [['NaN',],['NaN',],['NaN',],['Mark',20],['Jane',22],['Jack',23]]
df1 = pd.DataFrame(people1,columns=['Name','Age']).dropna()
df2 = pd.DataFrame(people2,columns=['Name','Age']).dropna()
df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)
people_list=[df1, df2]
df3 = pd.concat((people_list[0]['Name'], people_list[1]['Name']), axis=1)
print(df3)
This will help you concatenate two df
if I have understood correctly what you mean, this is a possible solution
people1 = [['Alex',10],['Bob',12],['Clarke',13],['NaN',],['NaN',],['NaN',]]
people2 = [['NaN',],['NaN',],['NaN',],['Mark',20],['Jane',22],['Jack',23]]
df1 = pd.DataFrame(people1,columns=['Name1','Age']).dropna()
df2 = pd.DataFrame(people2,columns=['Name2','Age']).dropna().reset_index()
people_list=[df1, df2]
df3 = pd.concat((people_list[0]['Name1'], people_list[1]['Name2']), axis=1)
print(df3)
Name1 Name2
0 Alex Mark
1 Bob Jane
2 Clarke Jack
if you already have that dataframe:
count = df3.Name2.isna().sum()
df3.loc[:, 'Name2'] = df3.Name2.shift(-count)
df3 = df3.dropna()
print(df3)
Name1 Name2
0 Alex Mark
1 Bob Jane
2 Clarke Jack
I have a couple of data frames. I want to get data from 2 columns from first data frame for marking the rows that are present in second data frame.
First data frame (df1) looks like this
Sup4 Seats Primary Seats Back up Seats
Pa 3 2 1
Ka 2 1 1
Ga 1 0 1
Gee 1 1 0
Re 2 2 0
(df2) looks like
Sup4 First Last Primary Seats Backup Seats Rating
Pa Peter He NaN NaN 2.3
Ka Sonia Du NaN NaN 2.99
Ga Agnes Bla NaN NaN 3.24
Gee Jeffery Rus NaN NaN 3.5
Gee John Cro NaN NaN 1.3
Pa Pavol Rac NaN NaN 1.99
Pa Ciara Lee NaN NaN 1.88
Re David Wool NaN NaN 2.34
Re Stefan Rot NaN NaN 2
Re Franc Bor NaN NaN 1.34
Ka Tania Le NaN NaN 2.35
the output i require for each Sup4 name is to be grouped also by sorting the Rating from highest to lowest and then mark the columns for seats based on the df1 columns Primary Seats and Backup seats.
i did grouping and sorting for first Sup4 name Pa for sample and i have to do for all the names
Sup4 First Last Primary Seats Backup Seats Rating
Pa Peter He M 2.3
Pa Pavol Rac M 1.99
Pa Ciara Lee M 1.88
Ka Sonia Du M 2.99
Ka Tania Le M 2.35
Ga Agnes Bla M 3.24
:
:
:
continues like this
I have tried until grouping and sorting
sorted_df = df2.sort_values(['Sup4','Rating'],ascending=[True,False])
however i need help to pass df1 columns values to mark in second dataframe
Solution #1:
You can do a merge, but you need to include some logic to update your Seats columns. Also, it is important to mention that you need to decide what to do with data with unequal lengths. ~GeeandRe` have unequal lengths in both dataframes. More information in Solution #2:
df3 = (pd.merge(df2[['Sup4', 'First', 'Last', 'Rating']], df1, on='Sup4')
.sort_values(['Sup4', 'Rating'], ascending=[True, False]))
s = df3.groupby('Sup4', sort=False).cumcount() + 1
df3['Backup Seats'] = np.where(s - df3['Primary Seats'] > 0, 'M', '')
df3['Primary Seats'] = np.where(s <= df3['Primary Seats'], 'M', '')
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]:
Sup4 First Last Primary Seats Backup Seats Rating
5 Ga Agnes Bla M 3.24
6 Gee Jeffery Rus M 3.5
7 Gee John Cro M 1.3
3 Ka Sonia Du M 2.99
4 Ka Tania Le M 2.35
0 Pa Peter He M 2.3
1 Pa Pavol Rac M 1.99
2 Pa Ciara Lee M 1.88
8 Re David Wool M 2.34
9 Re Stefan Rot M 2.0
10 Re Franc Bor M 1.34
Solution #2:
After doing this solution, I realized Solution #1 would be much simpler, but I thought I mine as well include this. Also, this gives you insight on what do with values that had unequal size in both dataframes. You can reindex the first dataframe and use combine_first() but you have to do some preparation. Again, you need to decide what to do with data with unequal lengths. In my answer, I have simply excluded Sup4 groups with unequal lengths to guarantee that the indices align when finally calling combine_first():
# Purpose of `mtch` is to check if rows in second dataframe are equal to the count of seats in first.
# If not, then I have excluded the `Sup4` with unequal lengths in both dataframes
mtch = df1.groupby('Sup4')['Seats'].first().eq(df2.groupby('Sup4').size())
df1 = df1.sort_values('Sup4', ascending=True)[df1['Sup4'].isin(mtch[mtch].index)]
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
#`reindex` the dataframe, get the cumulative count, and manipulate data with `np.where`
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
s = df1.groupby('Sup4').cumcount() + 1
df1['Backup Seats'] = np.where(s - df1['Primary Seats'] > 0, 'M', '')
df1['Primary Seats'] = np.where(s <= df1['Primary Seats'], 'M', '')
#like df1, in df2 we exclude groups with uneven lengths and sort
df2 = (df2[df2['Sup4'].isin(mtch[mtch].index)]
.sort_values(['Sup4', 'Rating'], ascending=[True, False]).reset_index(drop=True))
#can use `combine_first` since we have ensured that the data is sorted and of equal lengths in both dataframes
df3 = df2.combine_first(df1)
#order columns and only include required columns
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]:
Sup4 First Last Primary Seats Backup Seats Rating
0 Ga Agnes Bla M 3.24
1 Ka Sonia Du M 2.99
2 Ka Tania Le M 2.35
3 Pa Peter He M 2.3
4 Pa Pavol Rac M 1.99
5 Pa Ciara Lee M 1.88