This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 3 years ago.
I am having below table
Particulars DC Amt
AA D 50
BB D 20
CC C 30
DD D 20
EE C 10
I require below output, if DC column is having "D" it should
have same amount in "Amt" column and if DC column is "C" should
multiply by (-1) with Amt amount.
Particulars DC Amt TTL
AA D 50 50
BB D 20 20
CC C 30 (30)
DD D 20 20
EE C 10 (10)
You can use np.where:
df['TTL'] = np.where(df.DC == 'D', df.Amt, -1*df.Amt)
Related
I have a dataframe:
ID val
1 a
2 b
3 c
4 d
5 a
7 d
6 v
8 j
9 k
10 a
I have a dictionary as follows:
{aa:3, bb: 3,cc:4}
In the dictionary the numerical values indicates the number of records. The sum of numerical values is equal to the number of rows that I have in the data frame. In this example 3 + 3 + 4 = 10 and I have 10 rows in the data frame.
I am trying to split the data frame by rows that are equal to the number given in the dictionary and fill the key as column value into a new column. The desired output is as follows:
ID val. new_col
1 a. aa
2 b aa
3 c. aa
4 d. bb
5 a. bb
6 v. bb
7. d. cc
8 j. cc
9 k. cc
10 a. cc
The order of the fill is not important as long as the count of records match with the count given in the dict. I am trying to resolve this by iterating through the dict but I am not able to isolate specific number of records of the data frame with every new key value pair.
I have also tried using pd.cut by splitting the dict values to bins and keys as column values. However I am getting the error ValueError: bins must increase monotonically.
d = {'aa':3, 'bb': 3,'cc':4}
df['new_col'] = pd.Series([np.repeat(i, j) for i, j in d.items()]).explode().to_numpy()
df
Out[64]:
ID val new_col
0 1 a aa
1 2 b aa
2 3 c aa
3 4 d bb
4 5 a bb
5 7 d bb
6 6 v cc
7 8 j cc
8 9 k cc
9 10 a cc
I have the following dataframe:
source name cost other_c other_b
a a 7 dd 33
b a 6 gg 44
c c 3 ee 55
b a 2
d b 21 qw 21
e a 16 aq
c c 10 55
I am doing a sum of name and source with:
new_df = df.groupby(['source', 'name'], as_index=False)['cost'].sum()
but it is dropping the remaining 6 columns in my dataframe. Is there a way to keep the rest of the columns? I'm not looking to add new column, just carry over the columns from the original dataframe
This question already has answers here:
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 2 years ago.
I have a dataframe as shown below
ID Score
A 20
B 60
A 40
C 50
B 100
C 60
C 40
A 10
A 10
A 70
From the above I would like to calculate the average score for each ID and total score.
Expected output:
ID Average_score Total_score
A 30 150
B 80 160
C 50 150
Use named aggregation for custom columns names:
df1 = (df.groupby('ID').agg(Average_score=('Score','mean'),
Total_score=('Score','sum'))
.reset_index())
print (df1)
ID Average_score Total_score
0 A 30 150
1 B 80 160
2 C 50 150
I have a set of data which I have already imported from excel xlsx file. After that I determine to find out the percentage of the total profit from each of the customer segment. I manage to use the pivot_table to summarize the the total profit of each customer segment. However, I also would like to know the percentage. How do I do that?
Pivot_table
profit = df.pivot_table(index = ['Customer Segment'], values = ['Profit'], aggfunc=sum)
Result So far
Customer Segment Profit
A a
B b
C c
D d
Maybe adding the percentage column to the pivot table would be an ideal way. But how can I do that?
How about
df['percent'] = df['Profit']/sum(df['Profit'])
For example you have this data frame:
Customer Segment Customer Profit
0 A AAA 12
1 B BBB 43
2 C CCC 45
3 D DDD 23
4 D EEE 67
5 C FFF 21
6 B GGG 45
7 A JJJ 67
8 A KKK 32
9 B LLL 13
10 C MMM 43
11 D NNN 13
From the above data frame you want to make pivot table.
import pandas as pd
import numpy as np
tableframe = pd.pivot_table(df, values='Profit', index=['Customer Segment'], aggfunc=np.sum)
Here is your pivot table:
Profit
Customer Segment
A 111
B 101
C 109
D 103
Now you want to add another column to tableframe then compute the percentage.
tableframe['percentage'] = ((tableframe.Profit / tableframe.Profit.sum()) * 100)
Here is your final tableframe:
Profit percentage
Customer Segment
A 111 26.179245
B 101 23.820755
C 109 25.707547
D 103 24.292453
This question already has answers here:
Computing the correlation coefficient between two multi-dimensional arrays
(3 answers)
Closed 4 years ago.
I have two dataframes as given below.
>>> df1
c1 c2
0 10 10
1 20 11
2 40 15
3 9 20
4 13 27
>>> df2
k1 k2
0 100 100
1 200 115
2 400 159
3 80 202
4 90 270
I would to compute correlation between Ks and Cs, something like given below
>>df3
c1 c2
k1 .99 -0.31
k2 -0.16 .98
Assuming data represented in df3 is correct , .99 is correlation coefficient between c1 & k1 , .35 is between c2 & k1 so on ..
How this can be computed?
I think you need something like this,
a=df1.columns.values
b=df2.columns.values
print [df1[u].corr(df2[v]) for u,v in list(itertools.product(a, b))]