How to pull a specific value from one dataframe into another? - pandas

I have two dataframes
How would one populate the values in bold from df1 into the column 'Value' in df2?

Use melt on df1 before merge your 2 dataframes
tmp = df1.melt('Rating', var_name='Category', value_name='Value2')
df2['Value'] = df2.merge(tmp, on=['Rating', 'Category'])['Value2']
print(df2)
# Output
Category Rating Value
0 Hospitals A++ 2.5
1 Education AA 2.1

Related

Concatenate single row dataframe with multiple row dataframe

I have a dataframe with large number of columns but single row as df1:
Col1 Col2 Price Qty
A B 16 5
I have another dataframe as follows, df2:
Price Qty
8 2.5
16 5
6 1.5
I want to achieve the following:
Col1 Col2 Price Qty
A B 8 2.5
A B 16 5
A B 6 1.5
Where essentially I am taking all rows of df1 and repeat it while concatenating with df2 but bring the Price and Qty columns from df2 and replace the ones present originally in df1.
I am not sure how to proceed with above.
I believe the following approach will work,
# first lets repeat the single row df1 as many times as there are rows in df2
df1 = pd.DataFrame(np.repeat(df1.values, len(df2.index), axis=0), columns=df1.columns)
# lets reset the indexes of both DataFrames just to be safe
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)
# now, lets merge the two DataFrames based on the index
# after dropping the Price and Qty columns from df1
df3 = pd.merge(df1.drop(['Price', 'Qty'], axis=1), df2, left_index=True, right_index=True)
# finally, lets drop the index columns
df3.drop(['index_x', 'index_y'], inplace=True, axis=1)

How do I offset a dataframe with values in another dataframe?

I have two dataframes. One is the basevales (df) and the other is an offset (df2).
How do I create a third dataframe that is the first dataframe offset by matching values (the ID) in the second dataframe?
This post doesn't seem to do the offset... Update only some values in a dataframe using another dataframe
import pandas as pd
# initialize list of lists
data = [['1092', 10.02], ['18723754', 15.76], ['28635', 147.87]]
df = pd.DataFrame(data, columns = ['ID', 'Price'])
offsets = [['1092', 100.00], ['28635', 1000.00], ['88273', 10.]]
df2 = pd.DataFrame(offsets, columns = ['ID', 'Offset'])
print (df)
print (df2)
>>> print (df)
ID Price
0 1092 10.02
1 18723754 15.76 # no offset to affect it
2 28635 147.87
>>> print (df2)
ID Offset
0 1092 100.00
1 28635 1000.00
2 88273 10.00 # < no match
This is want I want to produce: The price has been offset by matching
ID Price
0 1092 110.02
1 18723754 15.76
2 28635 1147.87
I've also looked at Pandas Merging 101
I don't want to add columns to the dataframe, and I don;t want to just replace column values with values from another dataframe.
What I want is to add (sum) column values from the other dataframe to this dataframe, where the IDs match.
The closest I come is df_add=df.reindex_like(df2) + df2 but the problem is that it sums all columns - even the ID column.
Try this :
df['Price'] = pd.merge(df, df2, on=["ID"], how="left")[['Price','Offset']].sum(axis=1)

Create new column based of two columns

I have two columns in a dataframe. I want to create third column such that if first column > second column than 1 ow 0. As below
Df
Value1 value 2. Newcolumn
101. 0
97. 1
Comparing two columns in a Pandas DataFrame and write the results of the comparison to a third column. It can do easily by these syntaxes
conditions=[(condition1),(condition2)]
choices=["choice1","choice2"]
df["new_column_name"]=np.select(conditions, choices, default)
conditions are the conditions to check for between the two columns
choices are the results to return based on the conditions
np.select is used to return the results to the new column
The dataframe is:
import numpy as np
import pandas as pd
#create DataFrame
df = pd.DataFrame({'Value1': [100,101],
'value 2': [101,97]})
#define conditions
conditions = [df['Value1'] < df['value 2'],
df['Value1'] > df['value 2']]
#define choices
choices = ['0', '1']
#create new column in DataFrame that displays results of comparisons
df['Newcolumn'] = np.select(conditions, choices, default='Tie')
Final dataframe
print(df)
Output:
Value1 value 2 Newcolumn
0 100 101 0
1 101 97 1

Pandas dataframe replace contents based on ID from another dataframe

This is what my main dataframe looks like:
Group IDs New ID
1 [N23,N1,N12] N102
2 [N134,N100] N501
I have another dataframe that has all the required ID info in an unordered manner:
ID Name Age
N1 Milo 5
N23 Mark 21
N11 Jacob 22
I would like to modify the original dataframe such that all IDs are replaced with their respective names obtained from the other file. So that the dataframe has only names and no IDs and looks like this:
Group IDs New ID
1 [Mark,Silo,Bond] Niki
2 [Troy,Fangio] Kvyat
Thanks in advance
IIUC you can .explode your lists, replace values with .map and regroup them with .groupby
df['ID'] = (df.ID.explode()
.map(df1.set_index('ID')['Name'])
.groupby(level=0).agg(list)
)
If New ID column is not a list, you can use only .map()
df['New ID'] = df['New ID'].map(df1.set_index('ID')['Name'])
you can try making a dict from your second DF and then replacing on the first using regex patterns (no need to fully understand it, check the comments bellow):
ps: since you didn't provide the full df with the codes, I created with some of them, that's why the print() won't replace all the results.
import pandas as pd
# creating dummy dfs
df1 = pd.DataFrame({"Group":[1,2], "IDs":["[N23,N1,N12]", "[N134,N100]"], "New ID":["N102", "N501"] })
df2 = pd.DataFrame({"ID":['N1', "N23", "N11", "N100"], "Name":["Milo", "Mark", "Jacob", "Silo"], "Age":[5,21,22, 44]})
# Create the unique dict we're using regex patterns to make exact match
dict_replace = df2.set_index("ID")['Name'].to_dict()
# 'f' before string means fstrings and 'r' means to interpret it as regex
# the \b is a regex pattern that it sinalizes the begining and end of the match
## so that if you're searching for N1, it won't match if it is N11
dict_replace = {fr"\b{k}\b":v for k, v in dict_replace.items()}
# Replacing on original where you want it
df1['IDs'].replace(dict_replace, regex=True, inplace=True)
print(df1['IDs'].tolist())
# >>> ['[Mark,Milo,N12]', '[N134,Silo]']
Please note the change in my dataframes. In your sample data, the IDs in df that do not exists in df1 IDs. I altered my df to ensure only IDs in df1 were represented. I use the following df
print(df)
Group IDs New
0 1 [N23,N1,N11] N102
1 2 [N11,N23] N501
print(df1)
ID Name Age
0 N1 Milo 5
1 N23 Mark 21
2 N11 Jacob 22
Solution
dict df1.Id and df.Name and map to an exploded df.IDs. Add the result to list.
df['IDs'] = df['IDs'].str.strip('[]')#Strip corner brackets
df['IDs'] = df['IDs'].str.split(',')#Reconstruct list, this was done because for some reason I couldnt explode list
#df.explode list and map df1 to df and add to list
df.explode('IDs').groupby('Group')['IDs'].apply(lambda x:(x.map(dict(zip(df1.ID,df1.Name)))).tolist()).reset_index()
Group IDs
0 1 [Mark, Milo, Jacob]
1 2 [Jacob, Mark]

iterating over a dictionary of empty pandas dataframes to append them with data from existing dataframe based on list of column names

I'm a biologist and very new to Python (I use v3.5) and pandas. I have a pandas dataframe (df), from which I need to make several dataframes (df1... dfn) that can be placed in a dictionary (dictA), which currently has the correct number (n) of empty dataframes. I also have a dictionary (dictB) of n (individual) lists of column names that were extracted from df. The keys in 2 dictionaries match. I'm trying to append the empty dfs within dictA with parts of df based on the column names within the lists in dictB.
import pandas as pd
listA=['A', 'B', 'C',...]
dictA={i:pd.DataFrame() for i in listA}
lets say I have something like this:
dictA={'A': df1, 'B': df2}
dictB={'A': ['A1', A2', 'A3'],
'B': ['B1', B2']}
df=pd.DataFrame({'A1': [0,2,4,5],
'A2': [2,5,6,7],
'A3': [5,6,7,8],
'B1': [2,5,6,7],
'B2': [1,3,5,6]})
listA=['A', 'B']
what I'm trying to get is for df1 and df2 to get appended with portions of df like this, so that the output for df1 is like this:
A1 A2 A3
0 0 2 5
1 2 4 6
2 4 6 7
3 5 7 8
df2 would have columns B1 and B2.
I tried the following loop and some alterations, but it doesn't yield populated dfs:
for key, values in dictA.items():
values.append(df[dictB[key]])
Thanks and sorry if this was already addressed elsewhere but I couldn't find it.
You could create the dataframes you want like this instead :
df = #Your original dataframe containing all the columns
df_A = df.iloc[:][[col for col in df if 'A' in col]]
df_B = df.iloc[:][[col for col in df if 'B' in col]]