I'm trying to convert a financial income statement spreadsheet into a table that I can easily plug into Power BI. The spreadsheet columns are as follows:
['Date', 'Budget Flag', 'Department', 'Account 1', 'Account 2', ..., 'Account N']
There are 78 columns of Accounts, and I need to consolidate. When I'm done, the spreadsheet columns need to be arranged like this:
['Date', 'Budget Flag', 'Department', 'Account Name', 'Value']
Here's what I've tried:
import pandas as pd
statement = pd.read_excel('Financial Dashboard Dataset.xlsx', sheet_name="Income Statement", header=0)
appended_data = []
for i in range(3,81):
value = statement.columns[i]
pivot = pd.pivot_table(statement, values=value, index=['Date', 'Budg', 'Depa'])
pivot = pivot.assign(Account=value)
appended_data.append(pivot)
appended_data = pd.concat(appended_data)
pivot.to_excel("final_product.xlsx", index=True, header=True)
So I'm essentially trying to pivot the values of each Account column against the three index columns, assign the Account column header as a value for each cell in a new column, and then union all of the pivot tables together.
My problem is that the resultant spreadsheet only contains the pivot table from the last account column. What gives?
The resultant spreadsheet only contains the pivot table from the last account column because that is the dataframe that I chose to write to excel.
I solved my problem by changing the last line to:
appended_data.to_excel("final_product.xlsx", index=True, header=True)
Sorry, all. It's been a long day. Hope this helps someone else!
Related
when i load the dataset on pandas it shows 1,00,000 rows but when i open it on excel it shows 3,00,000 rows ? is there any python code that could help me in dealing with this kind of discrepancy
import pandas as pd
df=pd.read_csv('C_data_2.csv')
# Get the counts of each value in the gender column
counts = df['Gender'].value_counts()
# Find the most common value in the gender column
most_common = counts.index[0]
# Impute missing values in the gender column with the most common value
df['Gender'] = df['Gender'].fillna(most_common)
# Replace all instances of "nan" with most_common in the "gender" column
df["Gender"].replace("nan", most_common, inplace=True)
I am trying to get Row total and column total from my dataframe. I have no issue with the column total. However, My row total is adding up all the job descriptions rather than showing total
here's my code:
Newdata= data.groupby(['Job Description','AgeBand'])['AgeBand'].count().reset_index(name="count")
Newdata= Newdata.sort_values(by = ['AgeBand'],ascending=True)
df=Newdata.pivot_table(index='Job Description', values = 'count', columns = 'AgeBand').reset_index()
df.loc['Total',:]= df.sum(axis=0)
df.loc[:,'Total'] = df.sum(axis=1)
df=df.fillna(0).astype(int, errors='ignore')
df
First preselect the columns you wish to add row wise, then use df.sum(axis=1).
I think you're after:
df.loc[:,'Total'] = df.loc[:,'20-29':'UP TO 20'].sum(axis=1)
I'm new to pandas. I'm trying to add columns to my df. There are multiple columns in the csv. The names of the columns include, "Name", "Date", ..., "Problem", "Problem.1", "Problem.2" etc. The user is going to be downloading the files at different times and the number of problems will change so I can't just list the problems.
I only want the columns: Name, Date, and all columns whose name contains the word "Problem".
I know this isn't correct but the idea is...
df=df['Name', 'Date', df.filter (regex = 'Problem')]
Any help is appreciated.Thank you in advance!!!
Use this:
df [ ['Name', 'Date'] + [col for col in df.columns if 'Problem' in col] ]
enter image description hereI want to filter a pandas data-frame to only keep columns that contain a certain wildcard and then keep the two columns directly to right of this.
The Dataframe is tracking pupil grades, overall total and feedback. I only want to keep the data that corresponds to Homework and not other assessments. So in the example below I would want to keep First Name, Last Name, any homework column and the corresponding points and feedback column which are always exported to the right of this.
First Name,Last Name,Understanding Business Homework,Points,Feedback,Past Paper Homework,Points,Feedback, Groupings/Structures Questions,Points, Feedback
import pandas as pd
import numpy as np
all_data = all_data.filter(like=('Homework') and ('First Name') and
('Second Name') and ('Points'),axis=1)
print(all_data.head())
export_csv = all_data.to_csv (r'C:\Users\Sandy\Python\Automate_the_Boring_Stuff\new.csv', index = None, header=True)
I have a list of transactions in a data frame and want to group by Symbols and take the sum of one of the columns. Additionally, I want the first instance of this column (per symbol).
My code:
local_filename= 'C:\Users\\nshah\Desktop\Naman\TEMPLATE.xlsx'
data_from_local_file = pd.read_excel(local_filename, error_bad_lines=False, sheet_name='JP_A')
data_from_local_file = data_from_local_file[['Symbol','Security Name', 'Counterparty', 'Manager', 'Rate', 'LocatedAmt']]
data_grouped = data_from_local_file.groupby(['Symbol'])
pivoted = data_grouped['LocatedAmt'].sum().reset_index()
Next I want first instance of let's say rate with same symbol.
Thank you in advance!
You can achieve the sum and first observed instance as follows:
data_grouped = data_from_local_file.groupby(['Symbol'], as_index=False).agg({'LocatedAmt':[sum, 'first']})
To accomplish this for all columns, you can pass the agg function across all columns:
all_cols = ['Symbol','Security Name', 'Counterparty', 'Manager', 'Rate', 'LocatedAmt']
data_grouped_all = data_from_local_file.groupby(['Symbol'], as_index=False)[all_cols].agg([sum, 'first'])