This question already has answers here:
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 2 years ago.
df_train[catcols] = df_train[catcols].fillna("NANO")
df_test[catcols[:-2]] = df_test[catcols[:-2]].fillna("NANO")
fillna is a method in pandas series and dataframe. It replaces NA/NAN values in a dataframe.
Syntax :
df.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
In this case
df_train[catcols] = df_train[catcols].fillna("NANO")
df_test[catcols[:-2]] = df_test[catcols[:-2]].fillna("NANO")
'NANO' string is replaces where a NONE value is found in the dataframe.
For example:
if your dataframe df is :
Index Name
1 Jacob
2 Andrew
3 NONE
4 NONE
5 Steve
df['Name'].fillna('Aagam')
Index Name
1 Jacob
2 Andrew
3 Aagam
4 Aagam
5 Steve
to learn more visit Pandas.dataframe.fillna
Related
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
I am completely new to programming and started learning Python recently.
I have a pandas data frame df as shown in image 1 and trying to rearrange the columns as shown in image 2.
Can you please help me to complete this.
Thanks and Regards,
Arya.
You can use pd.pivot_table like this:
df=pd.DataFrame({'index':[0,1,2,0,1,2],'Name':['A','A','A','B','B','B'],'Value':[10,20,30,15,25,35]})
df.pivot_table(index='index',columns='Name',values='Value').reset_index()
Out[8]:
Name index A B
0 0 10 15
1 1 20 25
2 2 30 35
This question already has answers here:
Pandas .loc without KeyError
(6 answers)
Closed 2 years ago.
Say I have a DataFrame (with a multi-index, for that matter), and I wish to take the values at some index - but, if that index does not exist, I wish for it to return an empty df instead of a KeyError.
I've searched for similar questions, but they are all about pandas returning an empty dataframe when it is not desired at some cases (conversely, I do desire an empty dataframe in return).
For example:
import pandas as pd
df = pd.DataFrame(index=pd.MultiIndex.from_tuples([(1,1),(1,2),(3,1)]),
columns=['a','b'], data=[[1,2],[3,4],[10,20]])
so, df is:
a b
1 1 1 2
2 3 4
3 1 10 20
and df.loc[1] is:
a b
1 1 2
2 3 4
df.loc[2] raises a KeyError, and I'd like something that returns
a b
The closest I could get is by calling df.loc[idx:idx] as a slice, which gives the correct result for idx=2, but for idx=1 it returns
a b
1 1 1 2
2 3 4
instead of the desires result.
Of course I can define a function to do it,
One idea with if-else statament:
def get_val(x):
return df.loc[x] if x in df.index.levels[0] else pd.DataFrame(columns=df.columns)
Or generally with try-except statement:
def get_val(x):
try:
return df.loc[x]
except KeyError:
return pd.DataFrame(columns=df.columns)
print (get_val(1))
a b
1 1 2
2 3 4
print (get_val(2))
Empty DataFrame
Columns: [a, b]
Index: []
This question already has answers here:
Appending to an empty DataFrame in Pandas?
(5 answers)
Creating an empty Pandas DataFrame, and then filling it
(8 answers)
Closed 3 years ago.
I am trying to append a new row to an empty dataset and i found the below code fine:
import panda as pd
df = pd.DataFrame(columns=['A'])
for i in range(5):
df = df.append({'A': i}, ignore_index=True)
So, it gives me:
A
0 0
1 1
2 2
3 3
4 4
But, when i try the below code, my dataset is still empty:
df = pd.DataFrame(columns=['A'])
df.append({'A': 2}, ignore_index=True)
df
Can someone explain me the solution to add only 1 row?
This question already has answers here:
How to replace NaNs by preceding or next values in pandas DataFrame?
(10 answers)
Closed 3 years ago.
I want to apply diffs down columns for a pandas dataframe.
EX:
A B C
23 40000 1
24 nan nan
nan 42000 2
I would want something like:
A B C
23 40000 1
24 40000 1
24 42000 2
I have tried variations of pandas groupby. I think this is probably the right approach. (or applying some function down columns, but not sure if this is efficient correct me if i'm wrong)
I was able to "apply diffs down the column" and get something like:
A B C
24 42000 2
by calling: df = df.groupby('col', as_index=False).last() for each column, but this is not what I am looking for. I am not a pandas expert so apologies if this is a silly question.
Explained above
Look at this: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
df = df.fillna(method='ffill')
This question already has answers here:
How to reset index in a pandas dataframe? [duplicate]
(3 answers)
Closed 4 years ago.
I have group by data set but I'm unable to convert it to json. It throws out json with a bad format. TO_excel works fine.
Country Sub amount
3 source4
UK 1 source3
1 source1
US 2 source2
How can I export groupby dataset to_json?
There is problem you have MultiIndex in DataFrame, so need reset_index:
j = df.reset_index().to_json()
print (j)
{"Country":{"0":"UK","1":"UK","2":"US"},
"Sub":{"0":1,"1":1,"2":2},
"amount":{"0":"source3","1":"source1","2":"source2"}}