Merge columns and assign it to a column in pandas

Merge columns and assign it to a column in pandas - pandas

I have 2 pandas dataframes in the below formats:
df1:
Code Temp tmp_Code tmp_Age
ABCDFG NaN ABCDF NaN
ABCDEF 15 ABCDE NaN
df2
Code Temp
ABCDF 18
ABCDL 21
I am trying to merge 2 pandas dataframes based on tmp_Code in df1 with Code in df2. If there is a match, value in df2['Temp'] has to be filled in df1['tmp_Age']. I was able to do the join but not sure how to assign it to df1['tmp_Age'].
Code I tried:
df['tmp_Age'] = pd.merge(df[['tmp_Code','Temp']], df2[['Code','Temp']],left_on='tmp_Code',right_on='Code',how='left')
Desired output:
Code Temp tmp_Code tmp_Age
ABCDFG NaN ABCDF 18
ABCDEF 15 ABCDE NaN
Any suggestions would be appreciated.

Select the column Temp_y and set it as a new column:
df['tmp_Age'] = pd.merge(df[['tmp_Code','Temp']], df2[['Code','Temp']],
left_on='tmp_Code', right_on='Code',
how='left')['Temp_y'] # <- HERE
print(df)
# Output:
Code Temp tmp_Code tmp_Age
0 ABCDFG NaN ABCDF 18.0
1 ABCDEF 15.0 ABCDE NaN
Alternative to merge one column from a dataframe to another:
df['tmp_Age'] = df['tmp_Code'].map(df2.set_index('Code')['Temp'])
print(df)
# Output:
Code Temp tmp_Code tmp_Age
0 ABCDFG NaN ABCDF 18.0
1 ABCDEF 15.0 ABCDE NaN

Related

pandas: generate a dataframe, column a: start till end date (months) and two more columns

My question was to generic.
Ok, other try.
I want a dataframe with monthly dates in the first column a.
THen i want to go through the dates and fill the values in row b and c
import pandas as pd
from pandas import *
import datetime as dt
#try to generate a dataframe with dates
#This ist the dataframe, but how can I fill the dates
dfa = pd.DataFrame(columns=['date', '1G', '10G'])
print(dfa)
#This are the dates, but how to get them into the dataframe
#and how to add values in the empty cells
idx = pd.date_range("2016-01-01", periods=55, freq="M")
ts = pd.Series(range(len(idx)), index=idx)
print(ts)

If need column filled by datetimes:
dfa = pd.DataFrame({'date':pd.date_range("2016-01-01", periods=55, freq="M"),
'1G':np.nan,
'10G':np.nan})
print (dfa.head())
date 1G 10G
0 2016-01-31 NaN NaN
1 2016-02-29 NaN NaN
2 2016-03-31 NaN NaN
3 2016-04-30 NaN NaN
4 2016-05-31 NaN NaN
Or if need DatetimeIndex:
dfa = pd.DataFrame({'1G':np.nan,
'10G':np.nan},
index=pd.date_range("2016-01-01", periods=55, freq="M"))
print (dfa.head())
1G 10G
2016-01-31 NaN NaN
2016-02-29 NaN NaN
2016-03-31 NaN NaN
2016-04-30 NaN NaN
2016-05-31 NaN NaN

How to keep all values from a dataframe except where NaN is present in another dataframe?

I am new to Pandas and I am stuck at this specific problem where I have 2 DataFrames in Pandas, e.g.
>>> df1
A B
0 1 9
1 2 6
2 3 11
3 4 8
>>> df2
A B
0 Nan 0.05
1 Nan 0.05
2 0.16 Nan
3 0.16 Nan
What I am trying to achieve is to retain all values from df1 except where there is a NaN in df2 i.e.
>>> df3
A B
0 Nan 9
1 Nan 6
2 3 Nan
3 4 Nan
I am talking about dfs with 10,000 rows each so I can't do this manually. Also indices and columns are the exact same in each case. I also have no NaN values in df1.
As far as I understand df.update() will either overwrite all values including NaN or update only those that are NaN.

You can use boolean masking using DataFrame.notna.
# df2 = df2.astype(float) # This needed if your dtypes are not floats.
m = df2.notna()
df1[m]
A B
0 NaN 9.0
1 NaN 6.0
2 3.0 NaN
3 4.0 NaN

regroup uneven number of rows pandas df

I need to regroup a df from the above format in the one below but it fails and the output shape is (unique number of IDs, 2). Is there a more obvious solution?

You can use groupby and pivot:
(df.assign(n=df.groupby('ID').cumcount().add(1))
.pivot(index='ID', columns='n', values='Value')
.add_prefix('val_')
.reset_index()
)
Example input:
df = pd.DataFrame({'ID': [7,7,8,11,12,18,22,22,22],
'Value': list('abcdefghi')})
Output:
n ID val_1 val_2 val_3
0 7 a b NaN
1 8 c NaN NaN
2 11 d NaN NaN
3 12 e NaN NaN
4 18 f NaN NaN
5 22 g h i

Split one column of dataframe to new columns in pandas

I need to change the following data frame in which one column contains a list of tuple
df = pd.DataFrame({'columns1':list('AB'),'columns2':[1,2],
'columns3':[[(122,0.5), (104, 0)], [(104, 0.6)]]})
print (df)
columns1 columns2 columns3
0 A 1 [(122, 0.5), (104, 0)]
1 B 2 [(104, 0.6)]
in to this, in which the tuple first element should be the column header
columns1 columns2 104 122
0 A 1 0.0 0.5
1 B 2 0.6 NaN
How can I do this using panda in Jupiter notebook

Use list comprehension with convert values to dictionaries, sorting columns and add to original with DataFrame.join:
df = pd.read_csv('Sample - Sample.csv.csv')
print (df)
column1 column2 column3
0 A U1 [(187, 0.674), (111, 0.738)]
1 B U2 [(54, 1.0)]
2 C U3 [(169, 0.474), (107, 0.424), (88, 0.519), (57,...
import ast
df1 = pd.DataFrame([dict(ast.literal_eval(x)) for x in df.pop('column3')], index=df.index).sort_index(axis=1)
df = df.join(df1)
print (df)
column1 column2 54 57 64 88 107 111 169 187
0 A U1 NaN NaN NaN NaN NaN 0.738 NaN 0.674
1 B U2 1.0 NaN NaN NaN NaN NaN NaN NaN
2 C U3 NaN 0.526 0.217 0.519 0.424 NaN 0.474 NaN

Pandas subtract column values only when non nan

I have a dataframe df as follows with about 200 columns:
Date Run_1 Run_295 Prc
2/1/2020 3
2/2/2020 2 6
2/3/2020 5 2
I want to subtract column Prc from columns Run_1 Run_295 Run_300 only when they are non-Nan or non empty, to get the following:
Date Run_1 Run_295
2/1/2020
2/2/2020 -4
2/3/2020 3
I am not sure how to proceed with the above.
Code to reproduce the dataframe:
import pandas as pd
from io import StringIO
s = """Date,Run_1,Run_295,Prc
2/1/2020,,,3
2/2/2020,2,,6
2/3/2020,,5,2"""
df = pd.read_csv(StringIO(s))
print(df)

You can simply subtract it. It exactly does what you want:
df.Run_1-df.Prc
Here is the complete code to your output:
df.Run_1= df.Run_1-df.Prc
df.Run_295= df.Run_295-df.Prc
df.drop('Prc', axis=1, inplace=True)
df
Date Run_1 Run_295
0 2/1/2020 NaN NaN
1 2/2/2020 -4.0 NaN
2 2/3/2020 NaN 3.0

Three steps, melt to unpivot your dataframe
Then loc to handle assignment
& GroupBy to reomake your original df.
sure there is a better way to this, but this avoids loops and apply
cols = df.columns
s = pd.melt(df,id_vars=['Date','Prc'],value_name='Run Rate')
s.loc[s['Run Rate'].isnull()==False,'Run Rate'] = s['Run Rate'] - s['Prc']
df_new = s.groupby([s["Date"], s["Prc"], s["variable"]])["Run Rate"].first().unstack(-1)
print(df_new[cols])
variable Date Run_1 Run_295 Prc
0 2/1/2020 NaN NaN 3
1 2/2/2020 -4.0 NaN 6
2 2/3/2020 NaN 3.0 2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Merge columns and assign it to a column in pandas - pandas

Related

pandas: generate a dataframe, column a: start till end date (months) and two more columns

How to keep all values from a dataframe except where NaN is present in another dataframe?

regroup uneven number of rows pandas df

Split one column of dataframe to new columns in pandas

Pandas subtract column values only when non nan

Categories

Resources