average from sum of numbers - sum

I have to take the average of the following:
10/2 = 5
20/4 = 5
30/5 = 6
(5+5+6) / 3 = 5.333
Why the following does not give me the same result?
(10+20+30) / (5+5+6) is not 5.333
Thanks.

Related

Efficient way to do an incremental groupby in pandas

I would like to do an "incremental groupby". I have the following dataframe:
v1 increment
0.1 0
0.5 0
0.42 1
0.4 1
0.3 2
0.7 2
I would like to compute the average of column v1, by incrementally grouping by the column "increment". For instance when I do the first groupby for 0, I would get the average of the first two rows. The for the second groupby, I would get the average of the first 4 rows ( both increment= 0 and 1), then for the third groupby I would get the average of increment = 0,1 and 2)
Any idea how I could do that efficiently?
Expected output:
group average of v1
0 0.3
1 0.355
2 0.403
You can compute the cumulated sum and the cumulated size, then divide:
g = df.groupby('increment')['v1'] # set up a grouper for efficiency
out = (g.sum().cumsum() # cumulated sum
.div(g.size().cumsum()) # divide by cumulated size
.reset_index(name='average of v1')
)
output:
increment average of v1
0 0 0.300000
1 1 0.355000
2 2 0.403333
You can do a cumsum of v1 value then do a cumsum of each group size
cumsum = df.groupby('increment')['v1'].sum().cumsum()
cumsize = df.groupby('increment')['v1'].size().cumsum()
out = (cumsum.div(cumsize)
.to_frame('average of v1')
.reset_index())
print(out)
increment average of v1
0 0 0.300000
1 1 0.355000
2 2 0.403333

How to add Multilevel Columns and create new column?

I am trying to create a "total" column in my dataframe
idx = pd.MultiIndex.from_product([['Room 1','Room 2', 'Room 3'],['on','off']])
df = pd.DataFrame([[1,4,3,6,5,15], [3,2,1,5,1,7]], columns=idx)
My dataframe
Room 1 Room 2 Room 3
on off on off on off
0 1 4 3 6 5 15
1 3 2 1 5 1 7
For each room, I want to create a total column and then a on% column.
I have tried the following, however, it does not work.
df.loc[:, slice(None), "total" ] = df.xs('on', axis=1,level=1) + df.xs('off', axis=1,level=1)
Let us try something fancy ~
df.stack(0).eval('total=on + off \n on_pct=on / total').stack().unstack([1, 2])
Room 1 Room 2 Room 3
off on total on_pct off on total on_pct off on total on_pct
0 4.0 1.0 5.0 0.2 6.0 3.0 9.0 0.333333 15.0 5.0 20.0 0.250
1 2.0 3.0 5.0 0.6 5.0 1.0 6.0 0.166667 7.0 1.0 8.0 0.125
Oof this was a roughie, but you can do it like this if you want to avoid loops. Worth noting it redefines your df twice because i need the total columns. Sorry about that, but is the best i could do. Also if you have any questions just comment.
df = pd.concat([y.assign(**{'Total {0}'.format(x+1): y.iloc[:,0] + y.iloc[:,1]})for x , y in df.groupby(np.arange(df.shape[1])//2,axis=1)],axis=1)
df = pd.concat([y.assign(**{'Percentage_Total{0}'.format(x+1): (y.iloc[:,0] / y.iloc[:,2])*100})for x , y in df.groupby(np.arange(df.shape[1])//3,axis=1)],axis=1)
print(df)
This groups by the column's first index (rooms) and then loops through each group to add the total and percent on. The final step is to reindex using the unique rooms:
import pandas as pd
idx = pd.MultiIndex.from_product([['Room 1','Room 2', 'Room 3'],['on','off']])
df = pd.DataFrame([[1,4,3,6,5,15], [3,2,1,5,1,7]], columns=idx)
for room, group in df.groupby(level=0, axis=1):
df[(room, 'total')] = group.sum(axis=1)
df[(room, 'pct_on')] = group[(room, 'on')] / df[(room, 'total')]
result = df.reindex(columns=df.columns.get_level_values(0).unique(), level=0)
Output:
Room 1 Room 2 Room 3
on off total pct_on on off total pct_on on off total pct_on
0 1 4 5 0.2 3 6 9 0.333333 5 15 20 0.250
1 3 2 5 0.6 1 5 6 0.166667 1 7 8 0.125

How to implement this formula into pandas dataframe's column?

How to implement 3 Days average Sales % formula into pandas datafram's column_
I have a dataframe_
No Sale 3 Day Average Sale %
1 4786
2 7546
3 2578
4 6974 ( (No4 - ((No3+NO2+No1)/3)) / ((No3+NO2+No1)/3) ) * 100
Try rolling 4 elements at a time and apply a custom function
def average_sale_percent(x):
three_day_avg = sum(x[:3]) / 3
return ((x[3] - three_day_avg) / three_day_avg) * 100
df.Sale.rolling(4).apply(average_sale_percent)

Forcing dataframe recalculation after a change of a specific cell

I start with a simple
df = pd.DataFrame({'units':[30,20]})
And I get
units
0 30
1 20
I then add a row to total the column:
my_sum = df.sum()
df = df.append(my_sum, ignore_index=True)
Finally, I add a column to calculate percentages off of the 'units' column:
df['pct'] = df.units / df.units[:-1].sum()
ending with this:
units pct
0 30 0.6
1 20 0.4
2 50 1.0
So far so good - but now the question: I want to change the middle number of units from 20 to, for example, 30. I can use this:
df3.iloc[1, 0] = 40
or
df3.iat[1, 0] = 40
which change the cell, but the calculated values at both the last row and second column don't change to reflect it:
units pct
0 30 0.6
1 40 0.4
2 50 1.0
How do I force these calculated values to adjust following the change in that particular cell?
Make a function that calculates it
def f(df):
return df.append(df.sum(), ignore_index=True).assign(
pct=lambda d: d.units / d.units.iat[-1])
df.iat[1, 0] = 40
f(df)
units pct
0 30 0.428571
1 40 0.571429
2 70 1.000000

How to rename the column of an intermediate result?

I'm calculating an average by first getting the number of of months and then divide the number of records by that number like this:
monthly = tables[SUB_ACCT_DOC_ACC_MTHLY_SUM]
num_months = monthly.clndr_yr_month.unique().size
df = (monthly[["sub_acct_id", "clndr_yr_month"]].groupby(["sub_acct_id"]).size() / num_months).reset_index("sub_acct_id")
df.head(5)
What I get is
sub_acct_id 0
0 12716D 242.0
1 12716G 241.5
2 12716K 165.0
3 12716N 92.5
4 12716R 156.5
but how can I rename the new column to e.g. "avg"
sub_acct_id avg
0 12716D 242.0
1 12716G 241.5
2 12716K 165.0
3 12716N 92.5
4 12716R 156.5
You can access the names with the columns attribute of the dataframe:
df.columns = ['sub_acct_id','avg']