How to groupby a dataframe with two level header and generate box plot? - pandas

Now I have a dataframe like below (original dataframe):
Equipment
A
B
C
1
10
10
10
1
11
11
11
2
12
12
12
2
13
13
13
3
14
14
14
3
15
15
15
And I want to transform the dataframe like below (transformed dataframe):
1
-
-
2
-
-
3
-
-
A
B
C
A
B
C
A
B
C
10
10
10
12
12
12
14
14
14
11
11
11
13
13
13
15
15
15
How can I make such groupby transformation with two level header by Pandas?
Additionally, I want to use the transformed dataframe to generate box plot, and the whole box plot is divided into three parts (i.e. 1,2,3), and each part has three box plots (i.e. A,B,C). Can I use the transformed dataframe in Image 2 without any processing? Or can I realize the box plotting only by the original dataframe?
Thank you so much.

Try:
g = df.groupby(' Equipment ')[df.columns[1:]].apply(lambda x: x.reset_index(drop=True).T)
g:
Equipment 1 2 3
A B C A B C A B C
0 10 10 10 12 12 12 14 14 14
1 11 11 11 13 13 13 15 15 15
Explanation:
grp = df.groupby(' Equipment ')[df.columns[1:]]
grp.apply(print)
A B C
0 10 10 10
1 11 11 11
A B C
2 12 12 12
3 13 13 13
A B C
4 14 14 14
5 15 15 15
you can see the index 0 1, 2 3, 4 5 for each equipment group(1,2,3).
That's why I used reset_index to make them 0 1 for each group why???
If you do without reset index:
df.groupby(' Equipment ')[df.columns[1:]].apply(lambda x: x.T)
0 1 2 3 4 5
Equipment
1 A 10.0 11.0 NaN NaN NaN NaN
B 10.0 11.0 NaN NaN NaN NaN
C 10.0 11.0 NaN NaN NaN NaN
2 A NaN NaN 12.0 13.0 NaN NaN
B NaN NaN 12.0 13.0 NaN NaN
C NaN NaN 12.0 13.0 NaN NaN
3 A NaN NaN NaN NaN 14.0 15.0
B NaN NaN NaN NaN 14.0 15.0
C NaN NaN NaN NaN 14.0 15.0
See the values in (2,3) and (4,5) column. I want to combine them into (0, 1) column only. That's why reset index with a drop.
0 1
Equipment
1 A 10 11
B 10 11
C 10 11
2 A 12 13
B 12 13
C 12 13
3 A 14 15
B 14 15
C 14 15
You can play with the code to understand it deeply. What's happening inside.

Related

python rolling product on non-adjacent row

I would like to calculate rolling product of non-adjacent row, such as product of values in every fifth row as shown in the photo (result in blue cell is the product of data in blue cell etc.)
The best way I can do now is the following;
temp = pd.DataFrame([range(20)]).transpose()
df = temp.copy()
df['shift1'] = temp.shift(5)
df['shift2'] = temp.shift(10)
df['shift3'] = temp.shift(15)
result = df.product(axis=1)
however, it looks to be cumbersome as I want to change the row step dynamically.
can anyone tell me if there is a better way to navigate this?
Thank you
You can use groupby.cumprod/groupby.prod with the modulo 5 as grouper:
import numpy as np
m = np.arange(len(df)) % 5
# option 1
df['result'] = df.groupby(m)['data'].cumprod()
# option 2
df.loc[~m.duplicated(keep='last'), 'result2'] = df.groupby(m)['data'].cumprod()
# or
# df.loc[~m.duplicated(keep='last'),
# 'result2'] = df.groupby(m)['data'].prod().to_numpy()
output:
data result result2
0 0 0 NaN
1 1 1 NaN
2 2 2 NaN
3 3 3 NaN
4 4 4 NaN
5 5 0 NaN
6 6 6 NaN
7 7 14 NaN
8 8 24 NaN
9 9 36 NaN
10 10 0 NaN
11 11 66 NaN
12 12 168 NaN
13 13 312 NaN
14 14 504 NaN
15 15 0 0.0
16 16 1056 1056.0
17 17 2856 2856.0
18 18 5616 5616.0
19 19 9576 9576.0

Difference between previous row and next row in pandas gives NaN for first value

I am new to using Pandas and I have a dataframe df as given below
A B
0 4 5
1 5 8
2 6 11
3 7 13
4 8 15
5 9 30
6 10 477
7 11 3643
8 12 33469
9 13 141409
10 14 335338
11 15 365115
I want to get the difference between previous row and next row for B column
I used df.set_index('B').diff() but it gives NaN for first row. How to get 5 there?
A B
4 NaN
5 3.0
6 3.0
7 2.0
8 2.0
9 15.0
10 447.0
11 3166.0
12 29826.0
13 107940.0
14 193929.0
15 29777.0
Let us do
df.B.diff().fillna(df.B)
0 5.0
1 3.0
2 3.0
3 2.0
4 2.0
5 15.0
6 447.0
7 3166.0
8 29826.0
9 107940.0
10 193929.0
11 29777.0
Name: B, dtype: float64

How to add 1 to previous data if NaN in pandas

I was wondering if it is possible to add 1 (or n) to missing values in a pandas DataFrame / Series.
For example:
1
10
nan
15
25
nan
nan
nan
30
Would return :
1
10
11
15
25
26
27
28
30
Thank you,
Use .ffill + the result of a groupby.cumcount to determine n
df[0].ffill() + df.groupby(df[0].notnull().cumsum()).cumcount()
0 1.0
1 10.0
2 11.0
3 15.0
4 25.0
5 26.0
6 27.0
7 28.0
8 30.0
dtype: float64

Applying multiple functions to a pivot table (grouped) dataframe

I currently have a dataframe which looks like this:
df:
store item sales
0 1 1 10
1 1 2 20
2 2 1 10
3 3 2 20
4 4 3 10
5 3 4 15
...
I wanted to view the total sales of each items for each store so I used pivot table to create this:
p_table = pd.pivot_table(df, index='store', values='sales', columns='item', aggfunc=np.sum)
which gives something like:
sales
item 1 2 3 4
store
1 20 30 10 8
2 10 14 12 13
3 1 23 29 10
....
What I want to do now is apply some functions so that each total sales of items represents the percentage of the total sales for a particular store. For example, the value for item 1 at store1 would become:
1. 20/(20+30+10+8) * 100
I am struggling to do this for stacked dataframe. Any suggestions would be much appreciated.
Thanks
I think need divide by div with Series created by sum:
print (p_table)
item 1 2 3 4
store
1 10.0 20.0 NaN NaN
2 10.0 NaN NaN NaN
3 NaN 20.0 NaN 15.0
4 NaN NaN 10.0 NaN
print (p_table.sum(axis=1))
store
1 30.0
2 10.0
3 35.0
4 10.0
dtype: float64
out = p_table.div(p_table.sum(axis=1), axis=0)
print (out)
item 1 2 3 4
store
1 0.333333 0.666667 NaN NaN
2 1.000000 NaN NaN NaN
3 NaN 0.571429 NaN 0.428571
4 NaN NaN 1.0 NaN

based on a value in column A, shift the values in columns C and D to the right in a pandas dataframe

How can i achieve the desired result based on the following dataset ?
A B C D E
1 apple 5 2 20 NaN
2 orange 2 6 30 NaN
3 apple 6 1 40 NaN
4 apple 10 3 50 NaN
5 banana 8 9 60 NaN
Desired Result :
A B C D E
1 apple 5 NaN 2 20
2 orange 2 6 30 NaN
3 apple 6 NaN 1 40
4 apple 10 NaN 3 50
5 banana 8 9 60 NaN
IIUC you can use np.roll on the rows of interest, here we need to select only the rows where 'A' is 'apple' and then roll these by a single column row-wise and assign back:
In [14]:
df.loc[df['A']=='apple', 'C':] = np.roll(df.loc[df['A']=='apple', 'C':], 1,axis=1)
df
Out[14]:
A B C D E
1 apple 5 NaN 2 20.0
2 orange 2 6.0 30 NaN
3 apple 6 NaN 1 40.0
4 apple 10 NaN 3 50.0
5 banana 8 9.0 60 NaN
Note that because you introduce NaN values the dtype changes to float to allow this