python: aggregate columns in pivot table with multiindex structure - pandas

if i have multi-index pivot table like this:
what would be the way to aggregate total 'sum' and 'count' for all dates?
I want to see additional column with totals for all rows in the table.
Thanks to #Nik03 for the idea. The methond of concat returns required data frame but with single index level. To add it to original dataframe, you have to create columns first and assign new dataframes to:
table_to_show = pd.concat([table_to_record.filter(like='sum').sum(1), table_to_record.filter(like='count').sum(1)], axis=1)
table_to_show.columns = ['sum', 'count']
table_to_record['total_sum'] = table_to_show['sum']
table_to_record['total_count'] = table_to_show['count']
column_1st = table_to_record.pop('total_sum')
column_2nd = table_to_record.pop('total_count')
table_to_record.insert(0, 'total_sum', column_1st)
table_to_record.insert(1,'total_count', column_2nd)
and here is the result:

One way:
df1 = pd.concat([df.filter(like='sum').sum(
1), df.filter(like='mean').sum(1)], axis=1)
df1.columns = ['sum', 'mean']

Related

Pandas how to group by day and other column

I am getting the daily counts of rows from a dataframe using
df = df.groupby(by=df['startDate'].dt.date).count()
How can I modify this so I can also group by another column 'unitName'?
Thank you
Use list with GroupBy.size:
df = df.groupby([df['startDate'].dt.date, 'unitName']).size()
If need count non missing values, e.g. column col use DataFrameGroupBy.count:
df = df.groupby([df['startDate'].dt.date, 'unitName'])['col'].count()

How to concat 3 dataframes with each into sequential columns

I'm trying to understand how to concat three individual dataframes (i.e df1, df2, df3) into a new dataframe say df4 whereby each individual dataframe has its own column left to right order.
I've tried using concat with axis = 1 to do this, but it appears not possible to automate this with a single action.
Table1_updated = pd.DataFrame(columns=['3P','2PG-3Io','3Io'])
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io])
Note that with the exception of get_table1_2P_max_3Io, which has two columns, all other dataframes have one column
For example,
get_table1_3P =
get_table1_2P_max_3Io =
get_table1_3Io =
Ultimately, i would like to see the following:
I believe you need first concat and tthen change order by list of columns names:
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io], axis=1)
Table1_updated = Table1_updated[['3P','2PG-3Io','3Io']]

Calculate a new value using another dataframe

I am looking for a way to divide all columns in a dataframe with the value of a column from another df. This can be done using any of the 2 options mentioned below.
df_amenity_normalized = df_amenity.apply(
lambda row: row / df_targets['Population'].loc[row.name], axis=1)
Or join the tables and then calculate:
ndf=df_amenity.merge(df_targets, left_index=True, right_index=True)
ndft=ndf.apply(lambda x: x/ndf.Population, axis='rows' )
df_amenity_normalized1 = ndft.drop(columns=['Population', 'GNI', 'GDP', 'BM Dollar', 'HDI'])
Is there any other way to achive the same results?
Data is available here...
df_targets = pd.read_csv('https://raw.githubusercontent.com/njanakiev/osm-predict-economic-measurements/master/data/economic_measurements.csv', index_col='country')
df_targets.drop(columns='country_code', inplace=True)
df_targets = df_targets[['Population', 'GNI', 'GDP', 'BM Dollar', 'HDI']]
df_amenity = pd.read_csv('https://raw.githubusercontent.com/njanakiev/osm-predict-economic-measurements/master/data/country_amenity_counts.csv')
df_amenity.set_index('country', inplace=True)
df_amenity.drop(columns='country_code', inplace=True)
You can use the df.div() function from pandas. See below:
df_amenity.div(df_targets['Population'], axis = 0)

define aggfunc with two columns as arguments in pandas pivot table

I want only one value column as a result in below code:
df = pd.DataFrame({'team':['a','a'],'balance':[100,3],'dpd':[0,60]})
df.pivot_table(index='team',values=['balance','dpd'],
aggfunc=lambda x: np.sum(np.where(x.dpd>=30,x.balance,0))/np.sum(x.balance))
this return:
balance dpd
team
a 0.029126 0.029126
But, what I want is a column with new name :
dqratio
team
a 0.029126
I think you are looking for groupby and apply
df.groupby('team').apply(lambda x: np.sum(np.where(x['dpd']>=30,x['balance'],0))/np.sum(x['balance'])).to_frame('dqratio')
dqratio
team
a 0.029126

concat series onto dataframe with column name

I want to add a Series (s) to a Pandas DataFrame (df) as a new column. The series has more values than there are rows in the dataframe, so I am using the concat method along axis 1.
df = pd.concat((df, s), axis=1)
This works, but the new column of the dataframe representing the series is given an arbitrary numerical column name, and I would like this column to have a specific name instead.
Is there a way to add a series to a dataframe, when the series is longer than the rows of the dataframe, and with a specified column name in the resulting dataframe?
You can try Series.rename:
df = pd.concat((df, s.rename('col')), axis=1)
One option is simply to specify the name when creating the series:
example_scores = pd.Series([1,2,3,4], index=['t1', 't2', 't3', 't4'], name='example_scores')
Using the name attribute when creating the series is all I needed.
Try:
df = pd.concat((df, s.rename('CoolColumnName')), axis=1)