Dataframe group by numerical column and then combine with the original dataframe [duplicate] - pandas

This question already has answers here:
Pandas new column from groupby averages
(2 answers)
Closed 2 years ago.
I have a pandas data frame and I would like to first group by one of the columns and calculate mean of count of each group of that column. Then, I would like to combine this grouped entity with the original data frame.
An example:
df =
a b orders
1 3 5
5 8 10
2 3 6
Group by along column b and taking mean of orders
groupby_df =
b mean(orders)
3 5.5
8 10
End result:
df =
a b orders. mean(orders)
1 3 5 5.5
5 8 10 10
2 3 6 5.5
I know I can group by on b and then, do a inner join on b, but, I feel like it can be done in much cleaner/one-liner way. Is it possible to do better than that?

This is transform
df['mean']=df.groupby('b').orders.transform('mean')

Related

Transform a dataframe in this specific way [duplicate]

This question already has answers here:
Reshape Pandas DataFrame to a Series with columns prefixed with indices
(1 answer)
efficiently flatten multiple columns into a single row in pandas
(1 answer)
Closed 8 months ago.
(Please help me to rephrase the title. I looked at questions with similar titles but they are not asking the same thing.)
I have a dataframe like this:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
(the first column is indexes and not important)
I need to transform it so it ends up like this:
A A-1 A-2 B B-1 B-2 C C-1 C-2
1 2 3 4 5 6 7 8 9
I know about DataFrame.T which seems one step in the right direction, but how to programatically change the column headers, and move the rows "besides each other" to make it a single row?
First use DataFrame.unstack with convert values to one columns DataFrame by Series.to_frame and transpose, last flatten MultiIndex in list comprehension with if-else for expected ouput:
df1 = df.unstack().to_frame().T
df1.columns = [a if b == 0 else f'{a}-{b}' for a, b in df1.columns]
print (df1)
A A-1 A-2 B B-1 B-2 C C-1 C-2
0 1 2 3 4 5 6 7 8 9

pandas how to explode from two cells element-wise [duplicate]

This question already has answers here:
Efficient way to unnest (explode) multiple list columns in a pandas DataFrame
(7 answers)
Closed 9 months ago.
I have a dataframe:
df =
A B C
1 [2,3] [4,5]
And I want to explode it element-wise based on [B,C] to get:
df =
A B C
1 2 4
1 3 5
What is the best way to do so?
B and C are always at the same length.
Thanks
Try, in pandas 1.3.2:
df.explode(['B', 'C'])
Output:
A B C
0 1 2 4
0 1 3 5

Remove a string from certain column values and then operate them Pandas

I have a dataframe with a column named months (as bellow), but it contains some vales passed as "x years". So I want to remove the word "years" and multiplicate them for 12 so all column is consistent.
index months
1 5
2 7
3 3 years
3 9
4 10 years
I tried with
if df['months'].str.contains("years")==True:
df['df'].str.rstrip('years').astype(float) * 12
But it's not working
You can create a multiplier series based on index with "years" and multiply those months by 12
multiplier = np.where(df['months'].str.contains('years'), 12,1)
df['months'] = df['months'].str.replace('years','').astype(int)*multiplier
You get
index months
0 1 5
1 2 7
2 3 36
3 3 9
4 4 120
Slice and then use replace()
indexs = df['months'].str.contains("years")
df.loc[indexs , 'months'] = df['a'].str.replace("years" , "").astype(float) * 12

Remove all rows which each value is the same [duplicate]

This question already has answers here:
How do I get a list of all the duplicate items using pandas in python?
(13 answers)
Closed 3 years ago.
I want to drop all rows that have same values by drop_duplicates(subset=['other_things','Dist_1','Dist_2']) but could not get it.
Input
id other_things Dist_1 Dist_2
1 a a a
2 a b a
3 10 10 10
4 a b a
5 8 12 48
6 8 12 48
Expeted
id other_things Dist_1 Dist_2
2 a b a
4 a b a
5 8 12 48
6 8 12 48
Try
df = df.drop_duplicates()
It looks like the 'id' column could be generating problems.
Would recommend using the 'subset' parameter on drop duplicates as per the documentation.
drop_duplicates documentation1

Frequency count for each column in pandas dataset using a generalised approach [duplicate]

This question already has answers here:
Get total of Pandas column
(5 answers)
Pandas: sum DataFrame rows for given columns
(8 answers)
Closed 4 years ago.
I have a pandas dataframe with Binary values in the columns as such.
Id Label1 Label2 ......
1 0 1
2 1 1
3 1 1
4 1 1
The output I am looking for is as follows
Label1
3
Label2
4
How do I do this using some form of code that reads each of the columns in the whole panda data-frame and provides the sum for that column. I do know how to do it for one column.