Add new column based on groupby map in pandas [duplicate] - pandas

This question already has answers here:
Remap values in pandas column with a dict, preserve NaNs
(11 answers)
Closed 2 years ago.
I have a df as shown below
product bought_date number_of_sales
A 2016 15
A 2017 10
A 2018 15
B 2016 20
B 2017 30
B 2018 20
C 2016 20
C 2017 30
C 2018 20
From the above I would like to add one column called cost_per_unit as shown below.
cost of product A is 100, B is 500 and C is 200
d1 = {'A':100, 'B':500, 'C':'200'}
Expected Output:
product bought_date number_of_sales cost_per_unit
A 2016 15 100
A 2017 10 100
A 2018 15 100
B 2016 20 500
B 2017 30 500
B 2018 20 500
C 2016 20 200
C 2017 30 200
C 2018 20 200

No need for any lambda function. Run just:
df['cost_per_unit'] = df['product'].map(d1)
Additional remark: product is a name of a Pandas function. You should avoid
column names "covering" existing functions or attributes.
It is a good habit, that they should differ, at least in char case.

You can try this:
df['cost_per_unit'] = df.apply(lambda x: d1[x['product']], axis=1)
print(df)
product bought_date number_of_sales cost_per_unit
0 A 2016 15 100
1 A 2017 10 100
2 A 2018 15 100
3 B 2016 20 500
4 B 2017 30 500
5 B 2018 20 500
6 C 2016 20 200
7 C 2017 30 200
8 C 2018 20 200

Related

Loop Record in Oracle

I have a Table called TaxAmount. It has 3 columns(ID, Year, Amount). refer the below image.
I want to divide each row into 12 months. I attached a sample image below.
I'm new in Oracle side. please help me to write a Oracle Query to display the above result.
I tried ROWNUM. But No luck.
Here's one option:
SQL> select id, year, column_value as month, amount
2 from taxamount cross join
3 table(cast(multiset(select level from dual
4 connect by level <= 12
5 ) as sys.odcinumberlist))
6 order by id, year, month;
ID YEAR MONTH AMOUNT
---------- ---------- ---------- ----------
1 2022 1 100
1 2022 2 100
1 2022 3 100
1 2022 4 100
1 2022 5 100
1 2022 6 100
1 2022 7 100
1 2022 8 100
1 2022 9 100
1 2022 10 100
1 2022 11 100
1 2022 12 100
2 2022 1 200
2 2022 2 200
2 2022 3 200
2 2022 4 200
2 2022 5 200
2 2022 6 200
2 2022 7 200
2 2022 8 200
2 2022 9 200
2 2022 10 200
2 2022 11 200
2 2022 12 200
3 2022 1 150
3 2022 2 150
3 2022 3 150
3 2022 4 150
3 2022 5 150
3 2022 6 150
3 2022 7 150
3 2022 8 150
3 2022 9 150
3 2022 10 150
3 2022 11 150
3 2022 12 150
36 rows selected.
SQL>

Transposing multiple related columns

While transposing single columns is pretty straight forward I need to transpose a large amount of data with 3 sets of , 10+ related columns needed to be transposed.
create table test
(month int,year int,po1 int,po2 int,ro1 int,ro2 int,mo1 int,mo2 int, mo3 int);
insert into test
values
(5,2013,100,20,10,1,3,4,5),(4,2014,200,30,20,2,4,5,6),(6,2015,200,80,30,3,5,6,7) ;
select * FROM test;
gives
month
year
po1
po2
ro1
ro2
mo1
mo2
mo3
5
2013
100
20
10
1
3
4
5
4
2014
200
30
20
2
4
5
6
6
2015
200
80
30
3
5
6
7
Transposing using UNPIVOT
select
month, year,
PO, RO, MO
from ( SELECT * from test) src
unpivot
( PO for Description in (po1, po2))unpiv1
unpivot
(RO for Description1 in (ro1, ro2)) unpiv2
unpivot
(MO for Description2 in (mo1, mo2, mo3)) unpiv3
order by year
Gives me this
month
year
PO
RO
MO
5
2013
100
10
3
5
2013
100
10
4
5
2013
100
10
5
5
2013
100
1
3
5
2013
100
1
4
5
2013
100
1
5
5
2013
20
10
3
5
2013
20
10
4
5
2013
20
10
5
5
2013
20
1
3
5
2013
20
1
4
5
2013
20
1
5
4
2014
200
20
4
4
2014
200
20
5
4
2014
200
20
6
4
2014
200
2
4
4
2014
200
2
5
4
2014
200
2
6
4
2014
30
20
4
4
2014
30
20
5
4
2014
30
20
6
4
2014
30
2
4
4
2014
30
2
5
4
2014
30
2
6
6
2015
200
30
5
6
2015
200
30
6
6
2015
200
30
7
6
2015
200
3
5
6
2015
200
3
6
6
2015
200
3
7
6
2015
80
30
5
6
2015
80
30
6
6
2015
80
30
7
6
2015
80
3
5
6
2015
80
3
6
6
2015
80
3
7
I will like to turn it to something like this. Is that possible?
month
year
PO
RO
MO
5
2013
100
10
3
5
2013
20
1
4
5
2013
0
0
5
4
2014
200
20
4
4
2014
30
2
5
4
2014
0
0
6
6
2015
200
30
5
6
2015
80
3
6
6
2015
0
0
7
Maybe use a query like below which creates rows as per your design using CROSS APPLY
select month,year,po,ro,mo from
test cross apply
(values (po1,ro1,mo1), (po2,ro2,mo2),(0,0,mo3))v(po,ro,mo)
see demo here
Unpivot acts similar as union,Use union all in your case
SELECT month,
year,
po1 AS PO,
ro1 AS RO,
mo1 AS MO
FROM test
UNION ALL
SELECT month,
year,
po2,
ro2,
mo2
FROM test
UNION ALL
SELECT month,
year,
0,
0,
mo2
FROM test

Year wise aggregation on the given condition in pandas

I have a data frame as shown below. which is a sales data of two health care product starting from December 2016 to November 2018.
product price sale_date discount
A 50 2016-12-01 5
A 50 2017-01-03 4
B 200 2016-12-24 10
A 50 2017-01-18 3
B 200 2017-01-28 15
A 50 2017-01-18 6
B 200 2017-01-28 20
A 50 2017-04-18 6
B 200 2017-12-08 25
A 50 2017-11-18 6
B 200 2017-08-21 20
B 200 2017-12-28 30
A 50 2018-03-18 10
B 300 2018-06-08 45
B 300 2018-09-20 50
A 50 2018-11-18 8
B 300 2018-11-28 35
From the above I would like to prepare below data frame
Expected Output:
product year number_of_months total_price total_discount number_of_sales
A 2016 1 50 5 1
B 2016 1 200 10 1
A 2017 12 250 25 5
B 2017 12 1000 110 5
A 2018 11 100 18 2
B 2018 11 900 130 3
Note: Please note that the data starts from Dec 2016 to Nov 2018.
So number of months in 2016 is 1, in 2017 we have full data so 12 months and 2018 we have 11 months.
First aggregate sum by years and product and then create new column for counts by months by DataFrame.insert and Series.map:
df1 =(df.groupby(['product',df['sale_date'].dt.year], sort=False).sum().add_prefix('total_')
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map({2016:1, 2017:12, 2018:11}))
print (df1)
product sale_date number_of_months total_price total_discount
0 A 2016 1 50 5
1 A 2017 12 250 25
2 B 2016 1 200 10
3 B 2017 12 1000 110
4 A 2018 11 100 18
5 B 2018 11 900 130
If want dynamic dictionary by minumal and maximal datetimes use:
s = pd.date_range(df['sale_date'].min(), df['sale_date'].max(), freq='MS')
d = s.year.value_counts().to_dict()
print (d)
{2017: 12, 2018: 11, 2016: 1}
df1 = (df.groupby(['product',df['sale_date'].dt.year], sort=False).sum().add_prefix('total_')
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map(d))
print (df1)
product sale_date number_of_months total_price total_discount
0 A 2016 1 50 5
1 A 2017 12 250 25
2 B 2016 1 200 10
3 B 2017 12 1000 110
4 A 2018 11 100 18
5 B 2018 11 900 130
For ploting is used DataFrame.set_index with DataFrame.unstack:
df2 = (df1.set_index(['sale_date','product'])[['total_price','total_discount']]
.unstack(fill_value=0))
df2.columns = df2.columns.map('_'.join)
print (df2)
total_price_A total_price_B total_discount_A total_discount_B
sale_date
2016 50 200 5 10
2017 250 1000 25 110
2018 100 900 18 130
df2.plot()
EDIT:
df1 = (df.groupby(['product',df['sale_date'].dt.year], sort=False)
.agg( total_price=('price','sum'),
total_discount=('discount','sum'),
number_of_sales=('discount','size'))
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map({2016:1, 2017:12, 2018:11}))
print (df1)
product sale_date number_of_months total_price total_discount \
0 A 2016 NaN 50 5
1 A 2017 NaN 250 25
2 B 2016 NaN 200 10
3 B 2017 NaN 1000 110
4 A 2018 NaN 100 18
5 B 2018 NaN 900 130
number_of_sales
0 1
1 5
2 1
3 5
4 2
5 3

pandas- return Month containing Max value for each year

I have a dataframe like:
Year Month Value
2017 1 100
2017 2 1
2017 4 2
2018 3 88
2018 4 8
2019 5 87
2019 6 1
I'd the dataframe to return the Month and Value for each year where the value is the maximum:
year month value
2017 1 100
2018 3 88
2019 5 87
I've attempted something like df=df.groupby(["Year","Month"])['Value']).max() however, it returns the full data set because each Year / Month pair is unique (i believe).
You can get the index where the top Value occurs with .groupby(...).idxmax() and use that to index into the original dataframe:
In [28]: df.loc[df.groupby("Year")["Value"].idxmax()]
Out[28]:
Year Month Value
0 2017 1 100
3 2018 3 88
5 2019 5 87
Here is a solution that also handles duplicate possibility:
m = df.groupby('Year')['Value'].transform('max') == df['Value']
dfmax = df.loc[m]
Full example:
import pandas as pd
data = '''\
Year Month Value
2017 1 100
2017 2 1
2017 4 2
2018 3 88
2018 4 88
2019 5 87
2019 6 1'''
fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep='\s+')
m = df.groupby('Year')['Value'].transform('max') == df['Value']
print(df[m])
Year Month Value
0 2017 1 100
3 2018 3 88
4 2018 4 88
5 2019 5 87

Pandas: Group by two columns to get sum of another column

I look most of the previously asked questions but was not able to find answer for my question:
I have following data.frame
id year month score num_attempts
0 483625 2010 01 50 1
1 967799 2009 03 50 1
2 213473 2005 09 100 1
3 498110 2010 12 60 1
5 187243 2010 01 100 1
6 508311 2005 10 15 1
7 486688 2005 10 50 1
8 212550 2005 10 500 1
10 136701 2005 09 25 1
11 471651 2010 01 50 1
I want to get following data frame
year month sum_score sum_num_attempts
2009 03 50 1
2005 09 125 2
2010 12 60 1
2010 01 200 2
2005 10 565 3
Here is what I tried:
sum_df = df.groupby(by=['year','month'])['score'].sum()
But this doesn't look efficient and correct. If I have more than one column need to be aggregate this seems like a very expensive call. for example if I have another column num_attempts and just want to sum by year month as score.
This should be an efficient way:
sum_df = df.groupby(['year','month']).agg({'score': 'sum', 'num_attempts': 'sum'})