Appending a list to a dataframe - pandas

I have a dataframe let's say:
col1 col2 col3
1 x 3
1 y 4
and I have a list:
2
3
4
5
Can I append the list to the data frame like this:
col1 col2 col3
1 x 3
1 y 4
2 Nan Nan
3 Nan Nan
4 Nan Nan
5 Nan Nan
Thank you.

Use concat or append with DataFrame contructor:
df = df.append(pd.DataFrame([2,3,4,5], columns=['col1']))
df = pd.concat([df, pd.DataFrame([2,3,4,5], columns=['col1'])])
print (df)
col1 col2 col3
0 1 x 3.0
1 1 y 4.0
0 2 NaN NaN
1 3 NaN NaN
2 4 NaN NaN
3 5 NaN NaN

Related

How to select the rows having same id and have all missing value in another column

I have the following dataframe:
ID col_1
1 NaN
2 NaN
3 4.0
2 NaN
2 NaN
3 NaN
3 3.0
1 NaN
I need the following output:
ID col_1
1 NaN
1 NaN
2 NaN
2 NaN
2 NaN
how to do this in pandas
You can create a boolean mask with isna then group this mask by ID and transform using all, then you can filter the rows with the help of this mask:
mask = df['col_1'].isna().groupby(df['ID']).transform('all')
df[mask].sort_values('ID')
Alternatively you can use groupby + filter to filter out the groups which satisfy the condition where all values in col_1 are NaN but this method should be slower than the above:
df.groupby('ID').filter(lambda g: g['col_1'].isna().all()).sort_values('ID')
ID col_1
0 1 NaN
7 1 NaN
1 2 NaN
3 2 NaN
4 2 NaN
Let us try with isin after groupby with all
s = df['col_1'].isna().groupby(df['ID']).all()
df = df.loc[df.ID.isin(s[s].index.tolist())]
df
Out[73]:
ID col_1
0 1 NaN
1 2 NaN
3 2 NaN
4 2 NaN
7 1 NaN
import pandas as pd
import numpy as np
df=pd.read_excel(r"D:\Stack_overflow\test12.xlsx")
df1=(df[df['cols_1'].isnull()]).sort_values(by=['ID'])
I think we can simply take out the null values.

In pandas replace consecutive 0s with NaN

I want to clean some data by replacing only CONSECUTIVE 0s in a data frame
Given:
import pandas as pd
import numpy as np
d = [[1,np.NaN,3,4],[2,0,0,np.NaN],[3,np.NaN,0,0],[4,np.NaN,0,0]]
df = pd.DataFrame(d, columns=['a', 'b', 'c', 'd'])
df
a b c d
0 1 NaN 3 4.0
1 2 0.0 0 NaN
2 3 NaN 0 0.0
3 4 NaN 0 0.0
The desired result should be:
a b c d
0 1 NaN 3 4.0
1 2 0.0 NaN NaN
2 3 NaN NaN NaN
3 4 NaN NaN NaN
where column c & d are affected but column b is NOT affected as it only has 1 zero (and not consecutive 0s).
I have experimented with this answer:
Replacing more than n consecutive values in Pandas DataFrame column
which is along the right lines but the solution keeps the first 0 in a given column which is not desired in my case.
Let us do shift with mask
df=df.mask((df.shift().eq(df)|df.eq(df.shift(-1)))&(df==0))
Out[469]:
a b c d
0 1 NaN 3.0 4.0
1 2 0.0 NaN NaN
2 3 NaN NaN NaN
3 4 NaN NaN NaN

Merge columns based on values in multiple columns pandas

I have a DataFrame as follows:
Name Col2 Col3
0 A 16-1-2000 NaN
1 B 13-2-2001 NaN
2 C NaN NaN
3 D NaN 23-4-2014
4 X NaN NaN
5 Q NaN 4-5-2009
I want to make a combined column based on either data of Col2 & Col3, such it would give me following output.
Name Col2 Col3 Result
0 A 16-1-2000 NaN 16-1-2000
1 B 13-2-2001 NaN 13-2-2001
2 C NaN NaN NaN
3 D NaN 23-4-2014 23-4-2014
4 X NaN NaN NaN
5 Q NaN 4-5-2009 4-5-2009
I have tried following:
df['Result'] = np.where(df["Col2"].isnull() & df["Col3"].isnull(), np.nan, df["Col2"] if dfCrisiltemp["Col2"].notnull() else df["Col3"])
but no success.
Use combine_first or fillna:
df['new'] = df["Col2"].combine_first(df["Col3"])
#alternative
#df['new'] = df["Col2"].fillna(df["Col3"])
print (df)
Name Col2 Col3 new
0 A 16-1-2000 NaN 16-1-2000
1 B 13-2-2001 NaN 13-2-2001
2 C NaN NaN NaN
3 D NaN 23-4-2014 23-4-2014
4 X NaN NaN NaN
5 Q NaN 4-5-2009 4-5-2009
Your solution should be changed to another np.where:
df['new'] = np.where(df["Col2"].notnull() & df["Col3"].isnull(), df["Col2"],
np.where(df["Col2"].isnull() & df["Col3"].notnull(), df["Col3"], np.nan))
Or numpy.select:
m1 = df["Col2"].notnull() & df["Col3"].isnull()
m2 = df["Col2"].isnull() & df["Col3"].notnull()
df['new'] = np.select([m1, m2], [df["Col2"], df["Col3"]], np.nan)
For general solution filter all columns without first by iloc, forward fill NaNs and last select last column:
df['new'] = df.iloc[:, 1:].ffill(axis=1).iloc[:, -1]

transforming data frame in ipython a little like transpose

Suppose I have a data frame like the following data.frame in pandas
a 1 11
a 3 12
a 20 13
b 2 14
b 4 15
I want to generate a resulting data.frame like this
V1 1 2 3 4 20
a 11 NaN 12 NaN 13
b NaN 14 NaN 15 NaN
How can I get this transformation?
Thank you.
You can use pivot:
import pandas as pd
df = pd.DataFrame({'col1': ['a','a','a','b','b'],
'col2': [1,3,20,2,4],
'col3': [11,12,13,14,15]})
print df.pivot(index='col1', columns='col2')
Output:
col3
col2 1 2 3 4 20
col1
a 11 NaN 12 NaN 13
b NaN 14 NaN 15 NaN

Boxplot with pandas and groupby

I have the following dataset sample:
0 1
0 0 0.040158
1 2 0.500642
2 0 0.005694
3 1 0.065052
4 0 0.034789
5 2 0.128495
6 1 0.088816
7 1 0.056725
8 0 -0.000193
9 2 -0.070252
10 2 0.138282
11 2 0.054638
12 2 0.039994
13 2 0.060659
14 0 0.038562
And need a box and whisker plot, grouped by column 0. I have the following:
plt.figure()
grouped = df.groupby(0)
grouped.boxplot(column=1)
plt.savefig('plot.png')
But I end up with three subplots. How can place all three on one plot?
Thanks.
In 0.16.0 version of pandas, you could simply do this:
df.boxplot(by='0')
Result:
I don't believe you need to use groupby.
df2 = df.pivot(columns=df.columns[0], index=df.index)
df2.columns = df2.columns.droplevel()
>>> df2
0 0 1 2
0 0.040158 NaN NaN
1 NaN NaN 0.500642
2 0.005694 NaN NaN
3 NaN 0.065052 NaN
4 0.034789 NaN NaN
5 NaN NaN 0.128495
6 NaN 0.088816 NaN
7 NaN 0.056725 NaN
8 -0.000193 NaN NaN
9 NaN NaN -0.070252
10 NaN NaN 0.138282
11 NaN NaN 0.054638
12 NaN NaN 0.039994
13 NaN NaN 0.060659
14 0.038562 NaN NaN
df2.boxplot()