Pandas DataFrame Groupby Manipulation - pandas

I have a list of transactions in a data frame and want to group by Symbols and take the sum of one of the columns. Additionally, I want the first instance of this column (per symbol).
My code:
local_filename= 'C:\Users\\nshah\Desktop\Naman\TEMPLATE.xlsx'
data_from_local_file = pd.read_excel(local_filename, error_bad_lines=False, sheet_name='JP_A')
data_from_local_file = data_from_local_file[['Symbol','Security Name', 'Counterparty', 'Manager', 'Rate', 'LocatedAmt']]
data_grouped = data_from_local_file.groupby(['Symbol'])
pivoted = data_grouped['LocatedAmt'].sum().reset_index()
Next I want first instance of let's say rate with same symbol.
Thank you in advance!

You can achieve the sum and first observed instance as follows:
data_grouped = data_from_local_file.groupby(['Symbol'], as_index=False).agg({'LocatedAmt':[sum, 'first']})
To accomplish this for all columns, you can pass the agg function across all columns:
all_cols = ['Symbol','Security Name', 'Counterparty', 'Manager', 'Rate', 'LocatedAmt']
data_grouped_all = data_from_local_file.groupby(['Symbol'], as_index=False)[all_cols].agg([sum, 'first'])

Related

Concatenating Pandas Pivot Tables

I'm trying to convert a financial income statement spreadsheet into a table that I can easily plug into Power BI. The spreadsheet columns are as follows:
['Date', 'Budget Flag', 'Department', 'Account 1', 'Account 2', ..., 'Account N']
There are 78 columns of Accounts, and I need to consolidate. When I'm done, the spreadsheet columns need to be arranged like this:
['Date', 'Budget Flag', 'Department', 'Account Name', 'Value']
Here's what I've tried:
import pandas as pd
statement = pd.read_excel('Financial Dashboard Dataset.xlsx', sheet_name="Income Statement", header=0)
appended_data = []
for i in range(3,81):
value = statement.columns[i]
pivot = pd.pivot_table(statement, values=value, index=['Date', 'Budg', 'Depa'])
pivot = pivot.assign(Account=value)
appended_data.append(pivot)
appended_data = pd.concat(appended_data)
pivot.to_excel("final_product.xlsx", index=True, header=True)
So I'm essentially trying to pivot the values of each Account column against the three index columns, assign the Account column header as a value for each cell in a new column, and then union all of the pivot tables together.
My problem is that the resultant spreadsheet only contains the pivot table from the last account column. What gives?
The resultant spreadsheet only contains the pivot table from the last account column because that is the dataframe that I chose to write to excel.
I solved my problem by changing the last line to:
appended_data.to_excel("final_product.xlsx", index=True, header=True)
Sorry, all. It's been a long day. Hope this helps someone else!

Select a specific row from a multiindex dataframe in pandas

I would like to select the last row from a multiindex dataframe and append to a dict of buy and sell signals. For example, given the multiindex dataframe below:
enter image description here
I would like to select the last row indexed (HK.00700 and 2022-06-28 10:39:00), and add to the dict as follows while keeping the last row's multiindices:
enter image description here
The indices in the second pic are slightly different, but the idea is the same.
Reproduce your data
level = [['HK.00700'],[pd.Timestamp('2022-06-28 10:38:00'),pd.Timestamp('2022-06-28 10:39:00')]]
level_index = pd.MultiIndex.from_product(level, names=['code','time_key'])
transaction = {
'open':[360.6, 360.8],
'close':[360.6, 361.4],
'high':[360.8, 361.4],
'low':[360.4, 360.4],
'volume':[72500, 116300],
'upper_band':[360.906089, 361.180835],
'lower_band':[357.873911, 357.719165]
}
df = pd.DataFrame(data=transaction, index=level_index)
df
It is easy if you only want to select the last row,
df.tail(1)
Turn it into dict
df.tail(1).reset_index().loc[0].to_dict()
### Output
{'code': 'HK.00700',
'time_key': Timestamp('2022-06-28 10:39:00'),
'open': 360.8,
'close': 361.4,
'high': 361.4,
'low': 360.4,
'volume': 116300,
'upper_band': 361.180835,
'lower_band': 357.719165}

python: aggregate columns in pivot table with multiindex structure

if i have multi-index pivot table like this:
what would be the way to aggregate total 'sum' and 'count' for all dates?
I want to see additional column with totals for all rows in the table.
Thanks to #Nik03 for the idea. The methond of concat returns required data frame but with single index level. To add it to original dataframe, you have to create columns first and assign new dataframes to:
table_to_show = pd.concat([table_to_record.filter(like='sum').sum(1), table_to_record.filter(like='count').sum(1)], axis=1)
table_to_show.columns = ['sum', 'count']
table_to_record['total_sum'] = table_to_show['sum']
table_to_record['total_count'] = table_to_show['count']
column_1st = table_to_record.pop('total_sum')
column_2nd = table_to_record.pop('total_count')
table_to_record.insert(0, 'total_sum', column_1st)
table_to_record.insert(1,'total_count', column_2nd)
and here is the result:
One way:
df1 = pd.concat([df.filter(like='sum').sum(
1), df.filter(like='mean').sum(1)], axis=1)
df1.columns = ['sum', 'mean']

how to get row total in pandas

I am trying to get Row total and column total from my dataframe. I have no issue with the column total. However, My row total is adding up all the job descriptions rather than showing total
here's my code:
Newdata= data.groupby(['Job Description','AgeBand'])['AgeBand'].count().reset_index(name="count")
Newdata= Newdata.sort_values(by = ['AgeBand'],ascending=True)
df=Newdata.pivot_table(index='Job Description', values = 'count', columns = 'AgeBand').reset_index()
df.loc['Total',:]= df.sum(axis=0)
df.loc[:,'Total'] = df.sum(axis=1)
df=df.fillna(0).astype(int, errors='ignore')
df
First preselect the columns you wish to add row wise, then use df.sum(axis=1).
I think you're after:
df.loc[:,'Total'] = df.loc[:,'20-29':'UP TO 20'].sum(axis=1)

pandas: appending a row to a dataframe with values derived using a user defined formula applied on selected columns

I have a dataframe as
df = pd.DataFrame(np.random.randn(5,4),columns=list('ABCD'))
I can use the following to achieve the traditional calculation like mean(), sum()etc.
df.loc['calc'] = df[['A','D']].iloc[2:4].mean(axis=0)
Now I have two questions
How can I apply a formula (like exp(mean()) or 2.5*mean()/sqrt(max()) to column 'A' and 'D' for rows 2 to 4
How can I append row to the existing df where two values would be mean() of the A and D and two values would be of specific formula result of C and B.
Q1:
You can use .apply() and lambda functions.
df.iloc[2:4,[0,3]].apply(lambda x: np.exp(np.mean(x)))
df.iloc[2:4,[0,3]].apply(lambda x: 2.5*np.mean(x)/np.sqrt(max(x)))
Q2:
You can use dictionaries and combine them and add it as a row.
First one is mean, the second one is some custom function.
ad = dict(df[['A', 'D']].mean())
bc = dict(df[['B', 'C']].apply(lambda x: x.sum()*45))
Combine them:
ad.update(bc)
df = df.append(ad, ignore_index=True)