if (columnArow1= columnArow2, columnBrow2, "") excel if(logic_test, [value_if_true],[value_if_false]) how can I write this in python? - pandas

I would like to write Excel code into a Python (pandas)
I have filtered the df.loc[df.Activity_Mailbox.isnull()], now the na values must be calculated using
if (columnArow1 = columnArow2, columnBrow2, "")
This formula is according to Excel.

Please provide next time some demo data, like in your other question :-)
If I understand you correctly. Your data looks like:
df = pd.DataFrame({"A":[1,2,3,np.nan,5,np.nan],
"B":[10,11,12,13,14,15]})
df
A B
0 1.0 10
1 2.0 11
2 3.0 12
3 NaN 13
4 5.0 14
5 NaN 15
And now you want to fill the NaN with value from the other column. This can easily be done with:
df["A"] = df["A"].fillna(df["B"])
Output:
df
A B
0 1.0 10
1 2.0 11
2 3.0 12
3 13.0 13
4 5.0 14
5 15.0 15

Related

How to groupby a dataframe with two level header and generate box plot?

Now I have a dataframe like below (original dataframe):
Equipment
A
B
C
1
10
10
10
1
11
11
11
2
12
12
12
2
13
13
13
3
14
14
14
3
15
15
15
And I want to transform the dataframe like below (transformed dataframe):
1
-
-
2
-
-
3
-
-
A
B
C
A
B
C
A
B
C
10
10
10
12
12
12
14
14
14
11
11
11
13
13
13
15
15
15
How can I make such groupby transformation with two level header by Pandas?
Additionally, I want to use the transformed dataframe to generate box plot, and the whole box plot is divided into three parts (i.e. 1,2,3), and each part has three box plots (i.e. A,B,C). Can I use the transformed dataframe in Image 2 without any processing? Or can I realize the box plotting only by the original dataframe?
Thank you so much.
Try:
g = df.groupby(' Equipment ')[df.columns[1:]].apply(lambda x: x.reset_index(drop=True).T)
g:
Equipment 1 2 3
A B C A B C A B C
0 10 10 10 12 12 12 14 14 14
1 11 11 11 13 13 13 15 15 15
Explanation:
grp = df.groupby(' Equipment ')[df.columns[1:]]
grp.apply(print)
A B C
0 10 10 10
1 11 11 11
A B C
2 12 12 12
3 13 13 13
A B C
4 14 14 14
5 15 15 15
you can see the index 0 1, 2 3, 4 5 for each equipment group(1,2,3).
That's why I used reset_index to make them 0 1 for each group why???
If you do without reset index:
df.groupby(' Equipment ')[df.columns[1:]].apply(lambda x: x.T)
0 1 2 3 4 5
Equipment
1 A 10.0 11.0 NaN NaN NaN NaN
B 10.0 11.0 NaN NaN NaN NaN
C 10.0 11.0 NaN NaN NaN NaN
2 A NaN NaN 12.0 13.0 NaN NaN
B NaN NaN 12.0 13.0 NaN NaN
C NaN NaN 12.0 13.0 NaN NaN
3 A NaN NaN NaN NaN 14.0 15.0
B NaN NaN NaN NaN 14.0 15.0
C NaN NaN NaN NaN 14.0 15.0
See the values in (2,3) and (4,5) column. I want to combine them into (0, 1) column only. That's why reset index with a drop.
0 1
Equipment
1 A 10 11
B 10 11
C 10 11
2 A 12 13
B 12 13
C 12 13
3 A 14 15
B 14 15
C 14 15
You can play with the code to understand it deeply. What's happening inside.

How to keep True and None Value using pandas?

I've one DataFrame
import pandas as pd
data = {'a': [1,2,3,None,4,None,2,4,5,None],'b':[6,6,6,'NaN',4,'NaN',11,11,11,'NaN']}
df = pd.DataFrame(data)
condition = (df['a']>2) | (df['a'] == None)
print(df[condition])
a b
0 1.0 6
1 2.0 6
2 3.0 6
3 NaN NaN
4 4.0 4
5 NaN NaN
6 2.0 11
7 4.0 11
8 5.0 11
9 NaN NaN
Here, i've to keep where condition is coming True and Where None is there i want to keep those rows as well.
Expected output is :
a b
2 3.0 6
3 NaN NaN
4 4.0 4
5 NaN NaN
7 4.0 11
8 5.0 11
9 NaN NaN
Thanks in Advance
You can use another | or condition (Note: See #ALlolz's comment, you shouldnt compare a series with np.nan)
condition = (df['a']>2) | (df['a'].isna())
df[condition]
a b
2 3.0 6
3 NaN NaN
4 4.0 4
5 NaN NaN
7 4.0 11
8 5.0 11
9 NaN NaN

pandas: fillna whole df with groupby

I have the following df with a lot more number columns. I now want to make a forward filling for all the columns in the dataframe but grouped by id.
id date number number2
1 2001 4 11
1 2002 4 45
1 2003 NaN 13
2 2001 7 NaN
2 2002 8 2
The result should look like this:
id date number number2
1 2001 4 11
1 2002 4 45
1 2003 4 13
2 2001 7 NaN
2 2002 8 2
I tried the following command:
df= df.groupby("id").fillna(method="ffill", limit=2)
However, this raises a KeyError "isin". Filling just one column with the following command works just fine, but how can I efficiently forward fill the whole df grouped by isin?
df["number"]= df.groupby("id")["number"].fillna(method="ffill", limit=2)
You can use:
df = df.groupby("id").apply(lambda x: x.ffill(limit=2))
print (df)
id date number number2
0 1 2001 4.0 11.0
1 1 2002 4.0 45.0
2 1 2003 4.0 13.0
3 2 2001 7.0 NaN
4 2 2002 8.0 2.0
Also for me working:
df.groupby("id").fillna(method="ffill", limit=2)
so I think is necessary upgrade pandas.
ffill can be use directly
df.groupby('id').ffill(2)
Out[423]:
id date number number2
0 1 2001 4.0 11.0
1 1 2002 4.0 45.0
2 1 2003 4.0 13.0
3 2 2001 7.0 NaN
4 2 2002 8.0 2.0
#isin
#df.loc[:,df.columns.isin([''])]=df.loc[:,df.columns.isin([''])].groupby('id').ffill(2)

Pandas rolling function with specific numeric span?

As of Pandas 0.18.0, it is possible to have a variable rolling window size for time-series by specifying a time span. For example, the code for summation over a 2-second window in dataframe dft looks like this:
dft.rolling('2s').sum()
It is possible to do the same with non-datetime spans?
For example, given a dataframe that looks like this:
A B
0 1 1
1 2 2
2 3 3
3 5 5
4 6 6
5 7 7
6 10 10
Is it possible to specify a window span of say 3 on column 'A' and have the sum of column 'B' calculated, so that the output looks something like:
A B
0 1 NaN
1 2 NaN
2 3 5
3 5 10
4 6 14
5 7 18
6 10 17
Not with rolling(). See the documentation for the window argument:
[A variable-sized window] is only valid for datetimelike indexes.
Full text:
window : int, or offset
Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.
If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes.
Here's a workaround if you're interested.
df = pd.DataFrame({'A' : np.arange(10),
'B' : np.arange(10,20)},
index=[1,2,3,5,8,9,11,14,19,20])
def var_window(df, size, min_periods=None):
"""Operates on the index."""
result = []
df = df.sort_index()
for i in df.index:
start = i - size + 1
res = df.loc[start:i].sum().tolist()
result.append(res)
result = pd.DataFrame(result, index=df.index)
if min_periods:
result.loc[:min_periods - 1] = np.nan
return result
print(var_window(df, size=3, min_periods=3, inclusive=True))
0 1
1 NaN NaN
2 NaN NaN
3 3.0 33.0
5 5.0 25.0
8 4.0 14.0
9 9.0 29.0
11 11.0 31.0
14 7.0 17.0
19 8.0 18.0
20 17.0 37.0
Explanation: loop through the index. At each value, truncate the DataFrame to the trailing window size. Here 'size' is not a count, but rather a range as you have defined it.
In the above, at the index value of 8, you're summing the values of A for which the index is 8, 7, or 6. (I.e. > 8 - 3 + 1). The only index value that falls within that range is 8, so the sum is simply the value from the original frame. Comparatively, for the index value of 11, the sum will include values for 9 and 11 (5 + 6 = 11, the resulting sum for A).
Compare this with standard rolling ops:
print(df.rolling(window=3).sum())
A B
1 NaN NaN
2 NaN NaN
3 3.0 33.0
5 6.0 36.0
8 9.0 39.0
9 12.0 42.0
11 15.0 45.0
14 18.0 48.0
19 21.0 51.0
20 24.0 54.0
If I'm misinterpreting your question, let me know how. It's admittedly significantly slower:
%timeit df.rolling(window=3).sum()
1000 loops, best of 3: 627 µs per loop
%timeit var_window(df, size=3, min_periods=3)
100 loops, best of 3: 3.59 ms per loop

transforming data frame in ipython a little like transpose

Suppose I have a data frame like the following data.frame in pandas
a 1 11
a 3 12
a 20 13
b 2 14
b 4 15
I want to generate a resulting data.frame like this
V1 1 2 3 4 20
a 11 NaN 12 NaN 13
b NaN 14 NaN 15 NaN
How can I get this transformation?
Thank you.
You can use pivot:
import pandas as pd
df = pd.DataFrame({'col1': ['a','a','a','b','b'],
'col2': [1,3,20,2,4],
'col3': [11,12,13,14,15]})
print df.pivot(index='col1', columns='col2')
Output:
col3
col2 1 2 3 4 20
col1
a 11 NaN 12 NaN 13
b NaN 14 NaN 15 NaN