I have 2 df
df1:
ID X Y Cond
Johnson 2 3 fine
Sand NAN NAN sick
Cooper 1 2 fine
Nelson 1 2 fine
Peterson 4 5 fine
and df2 :
id2 X Y
Magic 2 3
Sand 2 3
Cooper 1 2
Dean 1 2
I want to update x value in df1, if Cond ="sick" and df["id"] = df["id2]
to get the new df1 :
ID X Y Cond
Johnson 2 3 fine
Sand 2 3 sick
Cooper 1 2 fine
Nelson 1 2 fine
Peterson 4 5 fine
I tried :
df1["x"] = np.where((df["cond"]=="sick")& (df1["id"]==df2["id2"]),df2["x"],"")
But its not working. I get this ValueError :
ValueError: Can only compare identically-labeled Series objects
Thank you
First convert both ID columns to index values for possible match selected rows by DataFrame.loc:
df11 = df1.set_index('ID')
df22 = df2.set_index('id2')
df11.loc[df11["Cond"]=="sick", ['X','Y']] = df22[['X','Y']]
df = df11.reset_index()
print (df)
ID X Y Cond
0 Johnson 2 3 fine
1 Sand 2 3 sick
2 Cooper 1 2 fine
3 Nelson 1 2 fine
4 Peterson 4 5 fine
You can use the where() method of pandas dataframes instead of the wherefunction from numpy. The code looks like this :
df1.loc[:,["X", "Y"]] = df1.loc[:,["X", "Y"]].where(df1["Cond"]!="sick",df2.loc[:,["X", "Y"]])
Related
I would like to transform a data frame using pandas.
Old-Dataframe:
Person-ID
Reference-ID
Name
1
1
Max
2
1
Kevin
3
1
Sara
4
4
Chessi
5
9
Fernando
into a new-dataframe in the following format.
New-Dataframe:
Person-ID
Reference-ID
Member1
Member2
Member3
1
1
Max
Kevin
Sara
4
4
Chessi
5
9
Fernando
My solution would be:
Write all the Reference-IDs from the old-dataframe into the new-dataframe
Write all the Person-Ids from the old-dataframe into the new-dataframe, which their reference_id is not in the old-dataframe (see example Fernando)
Loop trough the "old"-dataframe and add the name to the corresponding line in the new dataframe
Do you have any suggestions, on how to make this faster/simpler?
PS: The old-dataframe can be made like this
person_id = [1,2,3,4,5]
reference_id = [1,1,1,4,9]
name = ['Max','Kevin','Sara',"Chessi","Fernando"]
list_tuples=list(zip(person_id,reference_id,name))
old_dataframe = pd.DataFrame(list_tuples,columns=['Person_ID','Reference_id','Name'])
You can use pivot_table() like this:
df1= pd.pivot_table(df, index=['Reference-ID'], values=['Person-ID', 'Name'], aggfunc=({'Person-ID':'min', 'Name':lambda x:list(x), 'Person-ID':'min'}))
df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))
Output:
Person-ID
Reference-ID
0
1
2
1
1
Max
Kevin
Sara
4
4
Chessi
None
None
5
9
Fernando
None
None
You can reassign column names like this:
df2=df1.reset_index()[['Person-ID','Reference-ID']].join(pd.DataFrame(df1.Name.tolist()))
df2.columns=list(df2.columns[0:2])+[f"Member{x+1}" for x in df2.columns[2:]]
Output:
Person-ID
Reference-ID
Member1
Member2
Member3
1
1
Max
Kevin
Sara
4
4
Chessi
None
None
5
9
Fernando
None
None
How to concat/append based on common column values?
I'm creating some dfs from some files, and I want to compile them.
The columns don't always match, but there will always be some common columns (I only know a few columns guaranteed to match, but there's a lot of columns, and I'd like to retain as much info as possible)
df1:
Name
Status
John
1
Jane
2
df2:
Extra1
Extra2
Name
Status
a
b
Bob
2
c
d
Nancy
2
Desired output:
either this (doesn't matter of the order):
Extra1
Extra2
Name
Status
a
b
Bob
2
c
d
Nancy
2
NULL
NULL
John
1
NULL
NULL
Jane
2
Or this (doesn't matter of the order):
Name
Status
John
1
Jane
2
Bob
2
Nancy
2
I've tried these, but doesn't get the result I want:
df = pd.concat([df2, df], axis=0, ignore_index=True)
df = df.set_index('Name').combine_first(df2.set_index('Name')).reset_index()
Thanks
Not sure why the tables aren't being formatted, it shows up fine in the preview
import pandas as pd
df1 = pd.DataFrame({'Name':['John', 'Jane'],'Status':[1,2]})
df2 = pd.DataFrame({'Extra1':['a','b'],'Extra2':['c','d'],'Name':['bob', 'nancy'],'Status':[2,2]})
df = pd.concat([df1,df2], axis=0, ignore_index=True)
Gives me
Name
Status
Extra1
Extra2
John
1
NaN
NaN
Jane
2
NaN
NaN
bob
2
a
c
nancy
2
b
d
Which looks to me like your desired output.
And your tables aren't formatted correctly because you need empty newlines between text and tables.
I have this data set for example:
Name Number Is true
0 Dani 2 yes
1 Dani 2 no
2 Jack 5 no
3 Jack 5 maybe
4 Dani 2 maybe
I want to create a new data set that combines similar rows and adds columns by column different values. This is the output I'm trying to get:
Name Number Is true1 Is true2 Is true3
0 Dani 2 yes no maybe
1 Jack 5 no maybe
I couldn't get it working from example 10 here:
How to pivot a dataframe
Would you be able to provide a specific example for this use case please?
Thanks.
Edit for respond:
Name yes no maybe
0 Dani 2 2 2
1 Jack NaN 5 5
With combination of pivot_table(...) and apply(...):
df.pivot_table(index=["Name", "Number"], values="Is true", aggfunc=list).apply(lambda x: pd.Series({f"Is true{id+1}": el for id, el in enumerate(x[0])}), axis=1).reset_index()
Output:
Name Number Is true1 Is true2 Is true3
0 Dani 2 yes no maybe
1 Jack 5 no maybe NaN
Edit
For your follow up. This might be something along the lines, what you're looking for:
df.pivot_table(index=["Name"], columns="Is true", values="Number", aggfunc=list).fillna('').apply(lambda x: pd.Series({f"{col}{id+1}": el for col in x.keys() for id, el in enumerate(x[col])}), axis=1).reset_index()
Output:
Name maybe1 no1 yes1
0 Dani 2.0 2.0 2.0
1 Jack 5.0 5.0 NaN
You can try this:
df2 = df.drop_duplicates(subset=['Name', 'Number Is'])
df2 = df2.reset_index(drop=True).assign(true= df.groupby('Number Is')['true'].agg(list).reset_index(drop=True) )
temp = df2['true'].apply(pd.Series).T
temp.index = temp.index+1
temp = temp.T
df2 = df2.assign(**temp.add_prefix('true').add_suffix(' Is')).drop(columns='true').fillna('')
output:
Name Number Is true1 Is true2 Is true3 Is
0 Dani 2 yes no maybe
1 Jack 5 no maybe
I want to aggregate 3 dataframes I have but instead of adding them together. I want to multiply 3 of them. is there a way to do it ?
i.e.
df=result.groupby(['name']).agg({'A':'sum','B':'sum'})
df1
A B
tim 1 5
emma 3 7
df2
A B
tim 1 8
emma 1 2
result
A B
tim 2 13
emma 4 9
Instead of summing the two, I want to multiply them:
A B
tim 1 40
emma 12 18
Use GroupBy.prod:
df=result.groupby(['name']).agg({'A':'prod','B':'prod'})
If need also join them:
df = pd.concat([df1, df2]).groupby('name', as_index=False).prod()
I have a concatenated dataframe of at least two concatenated dataframes:
i.e.
df1
Name | Type | ID
0 Joe A 1
1 Fred B 2
2 Mike Both 3
3 Frank Both 4
df2
Name | Type | ID
0 Bill Both 1
1 Jill Both 2
2 Mill B 3
3 Hill A 4
ConcatDf:
Name | Type | ID
0 Joe A 1
1 Fred B 2
2 Mike Both 3
3 Frank Both 4
0 Bill Both 1
1 Jill Both 2
2 Mill B 3
3 Hill A 4
Suppose after they are concatenated, I'd like to set Type for all records from df1 to C and all records from df2 to B. Is this possible?
The indices of the dataframes can be vastly different sizes.
Thanks in advance.
df3 = pd.concat([df1,df2], keys = (1,2))
df3.loc[(1), 'Type'] == 'C'
When you concat you can assign the df's keys. This will create a multi-index with the keys separating the concatonated df's. Then when you use .loc with keys you can use( around the key to call the group. In the code above we would change all the Types of df1 (which has a key of 1) to C.
Use merge with indicator=True to find rows belong to df1 or df2. Next, use np.where to assign A or B.
t = concatdf.merge(df1, how='left', on=concatdf.columns.tolist(), indicator=True)
concatdf['Type'] = np.where(t._merge.eq('left_only'), 'B', 'C')
Out[2185]:
Name Type ID
0 Joe C 1
1 Fred C 2
2 Mike C 3
3 Frank C 4
0 Bill B 1
1 Jill B 2
2 Mill B 3
3 Hill B 4