calculation in new column with if and else in pandas [duplicate] - pandas

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 3 years ago.
I am having below table
Particulars DC Amt
AA D 50
BB D 20
CC C 30
DD D 20
EE C 10
I require below output, if DC column is having "D" it should
have same amount in "Amt" column and if DC column is "C" should
multiply by (-1) with Amt amount.
Particulars DC Amt TTL
AA D 50 50
BB D 20 20
CC C 30 (30)
DD D 20 20
EE C 10 (10)

You can use np.where:
df['TTL'] = np.where(df.DC == 'D', df.Amt, -1*df.Amt)

Related

Pandas create new column with specific row values from dict

I have a dataframe:
ID val
1 a
2 b
3 c
4 d
5 a
7 d
6 v
8 j
9 k
10 a
I have a dictionary as follows:
{aa:3, bb: 3,cc:4}
In the dictionary the numerical values indicates the number of records. The sum of numerical values is equal to the number of rows that I have in the data frame. In this example 3 + 3 + 4 = 10 and I have 10 rows in the data frame.
I am trying to split the data frame by rows that are equal to the number given in the dictionary and fill the key as column value into a new column. The desired output is as follows:
ID val. new_col
1 a. aa
2 b aa
3 c. aa
4 d. bb
5 a. bb
6 v. bb
7. d. cc
8 j. cc
9 k. cc
10 a. cc
The order of the fill is not important as long as the count of records match with the count given in the dict. I am trying to resolve this by iterating through the dict but I am not able to isolate specific number of records of the data frame with every new key value pair.
I have also tried using pd.cut by splitting the dict values to bins and keys as column values. However I am getting the error ValueError: bins must increase monotonically.
d = {'aa':3, 'bb': 3,'cc':4}
df['new_col'] = pd.Series([np.repeat(i, j) for i, j in d.items()]).explode().to_numpy()
df
Out[64]:
ID val new_col
0 1 a aa
1 2 b aa
2 3 c aa
3 4 d bb
4 5 a bb
5 7 d bb
6 6 v cc
7 8 j cc
8 9 k cc
9 10 a cc

keep all column after sum and groupby including empty values

I have the following dataframe:
source name cost other_c other_b
a a 7 dd 33
b a 6 gg 44
c c 3 ee 55
b a 2
d b 21 qw 21
e a 16 aq
c c 10 55
I am doing a sum of name and source with:
new_df = df.groupby(['source', 'name'], as_index=False)['cost'].sum()
but it is dropping the remaining 6 columns in my dataframe. Is there a way to keep the rest of the columns? I'm not looking to add new column, just carry over the columns from the original dataframe

Groupby sum and average in pandas and make data frame [duplicate]

This question already has answers here:
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 2 years ago.
I have a dataframe as shown below
ID Score
A 20
B 60
A 40
C 50
B 100
C 60
C 40
A 10
A 10
A 70
From the above I would like to calculate the average score for each ID and total score.
Expected output:
ID Average_score Total_score
A 30 150
B 80 160
C 50 150
Use named aggregation for custom columns names:
df1 = (df.groupby('ID').agg(Average_score=('Score','mean'),
Total_score=('Score','sum'))
.reset_index())
print (df1)
ID Average_score Total_score
0 A 30 150
1 B 80 160
2 C 50 150

Percentage calculation from pivot table pandas

I have a set of data which I have already imported from excel xlsx file. After that I determine to find out the percentage of the total profit from each of the customer segment. I manage to use the pivot_table to summarize the the total profit of each customer segment. However, I also would like to know the percentage. How do I do that?
Pivot_table
profit = df.pivot_table(index = ['Customer Segment'], values = ['Profit'], aggfunc=sum)
Result So far
Customer Segment Profit
A a
B b
C c
D d
Maybe adding the percentage column to the pivot table would be an ideal way. But how can I do that?
How about
df['percent'] = df['Profit']/sum(df['Profit'])
For example you have this data frame:
Customer Segment Customer Profit
0 A AAA 12
1 B BBB 43
2 C CCC 45
3 D DDD 23
4 D EEE 67
5 C FFF 21
6 B GGG 45
7 A JJJ 67
8 A KKK 32
9 B LLL 13
10 C MMM 43
11 D NNN 13
From the above data frame you want to make pivot table.
import pandas as pd
import numpy as np
tableframe = pd.pivot_table(df, values='Profit', index=['Customer Segment'], aggfunc=np.sum)
Here is your pivot table:
Profit
Customer Segment
A 111
B 101
C 109
D 103
Now you want to add another column to tableframe then compute the percentage.
tableframe['percentage'] = ((tableframe.Profit / tableframe.Profit.sum()) * 100)
Here is your final tableframe:
Profit percentage
Customer Segment
A 111 26.179245
B 101 23.820755
C 109 25.707547
D 103 24.292453

correlation coefficient between columns of 2 dataframes [duplicate]

This question already has answers here:
Computing the correlation coefficient between two multi-dimensional arrays
(3 answers)
Closed 4 years ago.
I have two dataframes as given below.
>>> df1
c1 c2
0 10 10
1 20 11
2 40 15
3 9 20
4 13 27
>>> df2
k1 k2
0 100 100
1 200 115
2 400 159
3 80 202
4 90 270
I would to compute correlation between Ks and Cs, something like given below
>>df3
c1 c2
k1 .99 -0.31
k2 -0.16 .98
Assuming data represented in df3 is correct , .99 is correlation coefficient between c1 & k1 , .35 is between c2 & k1 so on ..
How this can be computed?
I think you need something like this,
a=df1.columns.values
b=df2.columns.values
print [df1[u].corr(df2[v]) for u,v in list(itertools.product(a, b))]