Groupby Get Group For Loop - pandas

I have a dataframe that I need to subset by the column measure name.
Measure_Group=measures.groupby('Measure'). I can get.group() like this CDC=Measure_Group.get_group('CDC') , but I have over 20 measures to subset. Is there a for loop or lambda function that I can use with the group by to subset all 20 column names with just one iteration instead of using the get.group multiple times

Related

Proportions for multiple subcategories

I am trying to calculate proportions with multiple subcategories. As seen in the screenshot below, the series is grouped by ['budget_levels', 'revenue_levels'].
I would like to calculate the proportion for each.
For example,
budget_levels=='low' & revenue_levels=='low' / budget_levels=='low'
budget_levels=='low' & revenue_levels=='medium' / budget_levels=='low'
However, not getting the desired output.
Is there any way I could do this calculation for each with a simple one-line code such as .apply(lambda) function?
Use value_counts to get the number of occurences of each combination. Then group by the column budget_levels and divide the observations in each group by their sum. sort_index makes it easier to compare the groups.
df.value_counts().groupby(level=0).transform(lambda x: x / x.sum()).sort_index()

Pandas Sequebtial Count of members within a group and the sum

If I want to have a sequential count within a group I can do something like
df['GID'] = df.groupby(['G_COL1','G_COL2]).cumcount()
I cannot however figure out how to generate a column that contains the total number of values within the group. So if the group had three members df['GID'] would contain 0,1 & 2 and df['COUNT'] would contain the value 3 for each of the three members
df["count_zeros"] = pd.DataFrame((df["GID"]==0)).cumsum()
df["COUNT"] = df.groupby("count_zeros").transform(lambda x: len(x))["GID"]
I think the above gives what you want. The GID column starts from zero whenever a new group starts taking place and then we count how many zeros, i.e. new group "starts" we have with len.
As Scott Boston, commented,
df["COUNT"] = df.groupby("count_zeros")['GID'].transform('count')
works and looks great :)

Can I use dataframes as Input for functions?

I am currently trying to find optimal portfolio weights by optimizing a utility function that depends on those weights. I have a dataframe of containing the time series of returns, named rets_optns. rets_optns has 100 groups of 8 assets (800 columns - 1st group column 1 to 8, 2nd group column 9 to 16). I also have a dataframe named rf_options with 100 columns that present the corresponding risk free rate for each group of returns. I want to create a new dataframe composed by the portfolio's returns, using this formula: p. returns= rf_optns+sum(weights*rets_optns). It should have 100 columns and each columns should represent the returns of a portfolio composed by 8 assets belonging to the same group. I currently have:
def pret(rf,weights,rets):
return rf+np.sum(weights*(rets-rf))
It does not work

Subtract the mean of a group for a column away from a column value

I have a companies dataset with 35 columns. The companies can belong to one of 8 different groups. How do I for each group create a new dataframe which subtract the mean of the column for that group away from the original value?
Here is an example of part of the dataset.
So for example for row 1 I want to subtract the mean of BANK_AND_DEP for Consumer Markets away from the value of 7204.400207. I need to do this for each column.
I assume this is some kind of combination of a transform and a lambda - but cannot hit the syntax.
Although it might seem counter-intuitive for this to involve a loop at all, looping through the columns themselves allows you to do this as a vectorized operation, which will be quicker than .apply(). For what to subtract by, you'll combine .groupby() and .transform() to get the value you need to subtract from a column. Then, just subtract it.
for column in df.columns:
df['new_'+column] = df[column]-df.groupby('Cluster')['column'].transform('mean')

recoding multiple variables in the same way

I am looking for the shortest way to recode many variables in the same way.
For example I have data frame where columns a,b,c are names of items of survey and rows are observations.
d <- data.frame(a=c(1,2,3), b=c(1,3,2), c=c(1,2,1))
I want to change values of all observations for selected columns. For instance value 1 of column "a" and "c" should be replaced to string "low" and values 2,3 of these columns should be replaced to "high".
I do it often with many columns so I am looking for function which can do it in very simple way, like this:
recode2(data=d, columns=a,d, "1=low, 2,3=high").
Almost ok is function recode from package cars, but if I have 10 columns to recode I have to rewrite it 10 times and it is not as effective as I want.