rename results in data frame pandas - pandas

in pandas data frame it try to make some statistical analysis on column (heart rate) it aggregate with patient id and hour of measure, then make all statistical analysis
(mean,max,etc)
, my question is how to rename the return result ( to name sum_heart_rate instead of sum, min_heart_rate instead of min )
as follows
newdataframe= df2.groupby(['DayHour','subject_id']).agg({"Heart Rate":['sum' ,'min','max','std', 'count','var','skew']})

You can use the below template. You add more columns if needed.
newdataframe= (df2.groupby(['DayHour','subject_id']).agg(sum_heart_rate =('heart rate', 'sum'), min_heart_rate =('heart rate','min'))
For pandas version below 0.25 use code below
newdataframe = df2.groupby('date')['heart rate'].agg([('sum_heart_rate','sum'), ('min_heart_rate','min')])

Related

How to create pivot table from non-numerical dataset by counting the instances from one column?

I have this dataset that looks like this:
I have tried to do this:
df.groupby(['Phase','frames','Origin_Type']).size()
and
pd.pivot_table(india, values = ['frames', 'Phase', 'Origin_Type'], index =['frames'],
columns = ['Phase', 'Origin_Type'], aggfunc = sum)
But both didnt give me the right results. I want to transform it to this (see pic below) wherein the values should be the sum of each theme found in each 'Origin_Type' per phase.
LINK to dataset here
You can check here crosstab
pd.crosstab(india['Location'],[india['Phase'], india['Origin_Type']])

pandas pivot table calculate difference and sort result

I have a pivot table like this in pandas:
I want to add a column [DIFFERENCE] and sort the table by that new [DIFFERENCE] Column
I have played around with table.diff(axis=1) and but somehow don't get the sorting to work...
any idea is very much appriciated
Creating a column based on the difference is very straightforward, sorting is possible by using sort_values
df['Difference'] = df['2021'] - df['2020']
df.sort_values('Difference', inplace=True)

Using Pandas in Jupyer Lab how can I modify the value of a column in a data frame form a float 2379.77 to a currency value $2379.77?

I am working on an assignment and I am trying to display one of the column's values in my data frame as currency $. When doing all the calculations the data displayed on a float but I want to style it to a currency value as it's referring to Total Revenue.
I would really appreciate it if someone could help me out with this one. I am attaching my code. I attached a screenshot of the summary table data returned.
# I created variables to hold the values to later create the summary table with them. I needed to go back and look at the decimal places that were used in the solution. To be able to match the solution format I decided to use the method round()
ItemCount = df["Item Name"].nunique()
AveragePrice = round(df["Price"].mean(),2)
PurchasedNumber = df["Purchase ID"].count()
Revenue = round(df["Price"].sum(),2)
#After I created the variables I need to store them in a summary table like so:
SummaryTable = pd.DataFrame([{"Number of Unique Items": ItemCount, "Average Price": AveragePrice, "Number of Purchases": PurchasedNumber, "Total Revenue": Revenue}])
SummaryTable
Try this:
SummaryTable.loc[:, "Total Revenue"] = SummaryTable["Total Revenue"].map(lambda x: '$' + str(x))

Pandas column classification

Here I am working on a fit bit data set where I have 35 User's Id column and all other activity columns which are of different dates, so now I need to classify all other columns with respective to user Id column in order to perform my analysis can any one help in this
What you're looking for is the fuction groupby(). You can group by one or more columns, and the perform aggregations per group on the rest of the columns
df.groupby(['id', 'date']).agg([...])

Computing grouped medians in DolphinDB

I have a DFS table in DolphinDB. I tried to run a query that would compute grouped medians on this table. But it just threw an exception.
select median(col1) from t group by col2
The aggregated function in column med(v1) doesn't have a map-reduce implementation and can't be applied to a partitioned or distributed table.
Seems to me that DolphinDB does not support distributed median algorithm.
The aggregated function median differs from avgerage in that it can't be solved by map-reduce. So we have to pull the data and then apply the median function to each group.
DolphinDB's repartition mechanism make such work much easier.
ds = repartitionDS(<select first(col2) as col2, median(col1) as col1 from t>,`col2, VALUE)
mr(ds, x->x,,unionAll{false})