How can I replace NaN values in DataFrame from another table? - pandas

I have a DataFrame 'df'
And the second is 'nan_gdp'
How can I fill NaN gdp in 'nan_gdp' by using my first DataFrame 'df'.
Also, in the first df I dont have all countries, it means that there are some countries which are in 'nan_gdp' but not in 'df'

Use Series.fillna by mapped values from df by Series.map:
s = df.set_index('Country')['GDP ($M)']
waste['GDP ($M)'] = waste['GDP ($M)'].fillna(waste['Country'].map(s))

Related

need to fill the non-null values to NaN based on column name

here need to update the non-null values of the columns values where specific text contains want to make them to nan values based on the column names. attached image for you reference.
Consider the following dataset
Replace column name with nan. But this won't replace any other string e.g. X0 contains value oth.
df.apply(lambda s: s.replace({s.name: np.nan}))
Replace all string with nan
df.apply(lambda s: pd.to_numeric(s, errors='coerce'))
Replace all string with nan on subset of columns
COLS = ['X0', 'X2']
df.apply(lambda s: pd.to_numeric(s, errors='coerce') if s.name in COLS else s)
Note: I have used pandas apply function but same result can be achieved with for loop

How to pull a specific value from one dataframe into another?

I have two dataframes
How would one populate the values in bold from df1 into the column 'Value' in df2?
Use melt on df1 before merge your 2 dataframes
tmp = df1.melt('Rating', var_name='Category', value_name='Value2')
df2['Value'] = df2.merge(tmp, on=['Rating', 'Category'])['Value2']
print(df2)
# Output
Category Rating Value
0 Hospitals A++ 2.5
1 Education AA 2.1

Getting NaN converting pandas Dataframe to Series

I am trying to convert pandas Dataframe to Series based on accepted answer to Convert dataframe to series for multiple column
However I am getting NaN in my integer column 'y'.
Here is my code:
data = [['2021-10-14 18:12:00.000', '22811316'],['2021-10-14 18:42:00.000', '22700704']]
df = pd.DataFrame(data, columns = ['ds', 'y'])
series = pd.Series(df.y, index=df.ds)
printing series gives me:
ds
2021-10-14 18:12:00.000 NaN
2021-10-14 18:42:00.000 NaN
Name: y, dtype: object
What am I missing?
I could find the answer in pandas.Series() Creation using DataFrame Columns returns NaN Data entries
The trick was to use:
series = pd.Series(df.y.values, index=df.ds)
If you just take the series df.y, you will obtain a series with new indices starting from 0, 1, ...
print(df.y)
0 22811316
1 22700704
Name: y, dtype: object
These indices do not match with the values of the column ds that you want to use as index.
So, when you create the new series with index=..., you will probably have all NaN.
In order to put just the values of y column into the new series, you have to take only its values using to_numpy()
series = pd.Series(df.y.to_numpy(), index=df.ds)
print(series)
ds
2021-10-14 18:12:00.000 22811316
2021-10-14 18:42:00.000 22700704
dtype: object

Fillna() depending on another column

I want to do next:
Fill DF1 NaN with values from DF2 depending on column value in DF1.
Basically, DF1 has people with "income_type" and some NaN in "total_income". In DF2 there are "median income" for each "income_type". I want to fill NaN in "total_income" DF1 with median values from DF2
DF1, DF2
First, I would merge values from DF2 to DF1 by 'income_type'
DF3 = DF1.merge(DF2, how='left', on='income_type')
This way you have the values of median income and total income in the same dataframe.
After this, I would do an if else statement for a pandas dataframe columns
DF3.loc[DF3['total_income'].isna(), 'total_income'] = DF3['median income']
That will replace the NaN values with the median values from the merge
You need to join the two dataframes and then replace the nan values with the median. Here is a similar working example. Cheers mate.
import pandas as pd
#create the example dataframes
df1 = pd.DataFrame({'income_type':['a','b','c','a','a','b','b'], 'total_income':[200, 300, 500,None,400,None,None]})
df2 = pd.DataFrame({'income_type':['a','b','c'], 'median_income':[205, 305, 505]})
# inner join the df1 with df2 on the column 'income_type'
joined = df1.merge(df2, on='income_type')
# fill the nan values the value from the column 'median_income' and save it in a new column 'total_income_not_na'
joined['total_income_not_na'] = joined['total_income'].fillna(joined['median_income'])
print(joined)

concat series onto dataframe with column name

I want to add a Series (s) to a Pandas DataFrame (df) as a new column. The series has more values than there are rows in the dataframe, so I am using the concat method along axis 1.
df = pd.concat((df, s), axis=1)
This works, but the new column of the dataframe representing the series is given an arbitrary numerical column name, and I would like this column to have a specific name instead.
Is there a way to add a series to a dataframe, when the series is longer than the rows of the dataframe, and with a specified column name in the resulting dataframe?
You can try Series.rename:
df = pd.concat((df, s.rename('col')), axis=1)
One option is simply to specify the name when creating the series:
example_scores = pd.Series([1,2,3,4], index=['t1', 't2', 't3', 't4'], name='example_scores')
Using the name attribute when creating the series is all I needed.
Try:
df = pd.concat((df, s.rename('CoolColumnName')), axis=1)