Year wise aggregation on the given condition in pandas - pandas

I have a data frame as shown below. which is a sales data of two health care product starting from December 2016 to November 2018.
product price sale_date discount
A 50 2016-12-01 5
A 50 2017-01-03 4
B 200 2016-12-24 10
A 50 2017-01-18 3
B 200 2017-01-28 15
A 50 2017-01-18 6
B 200 2017-01-28 20
A 50 2017-04-18 6
B 200 2017-12-08 25
A 50 2017-11-18 6
B 200 2017-08-21 20
B 200 2017-12-28 30
A 50 2018-03-18 10
B 300 2018-06-08 45
B 300 2018-09-20 50
A 50 2018-11-18 8
B 300 2018-11-28 35
From the above I would like to prepare below data frame
Expected Output:
product year number_of_months total_price total_discount number_of_sales
A 2016 1 50 5 1
B 2016 1 200 10 1
A 2017 12 250 25 5
B 2017 12 1000 110 5
A 2018 11 100 18 2
B 2018 11 900 130 3
Note: Please note that the data starts from Dec 2016 to Nov 2018.
So number of months in 2016 is 1, in 2017 we have full data so 12 months and 2018 we have 11 months.

First aggregate sum by years and product and then create new column for counts by months by DataFrame.insert and Series.map:
df1 =(df.groupby(['product',df['sale_date'].dt.year], sort=False).sum().add_prefix('total_')
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map({2016:1, 2017:12, 2018:11}))
print (df1)
product sale_date number_of_months total_price total_discount
0 A 2016 1 50 5
1 A 2017 12 250 25
2 B 2016 1 200 10
3 B 2017 12 1000 110
4 A 2018 11 100 18
5 B 2018 11 900 130
If want dynamic dictionary by minumal and maximal datetimes use:
s = pd.date_range(df['sale_date'].min(), df['sale_date'].max(), freq='MS')
d = s.year.value_counts().to_dict()
print (d)
{2017: 12, 2018: 11, 2016: 1}
df1 = (df.groupby(['product',df['sale_date'].dt.year], sort=False).sum().add_prefix('total_')
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map(d))
print (df1)
product sale_date number_of_months total_price total_discount
0 A 2016 1 50 5
1 A 2017 12 250 25
2 B 2016 1 200 10
3 B 2017 12 1000 110
4 A 2018 11 100 18
5 B 2018 11 900 130
For ploting is used DataFrame.set_index with DataFrame.unstack:
df2 = (df1.set_index(['sale_date','product'])[['total_price','total_discount']]
.unstack(fill_value=0))
df2.columns = df2.columns.map('_'.join)
print (df2)
total_price_A total_price_B total_discount_A total_discount_B
sale_date
2016 50 200 5 10
2017 250 1000 25 110
2018 100 900 18 130
df2.plot()
EDIT:
df1 = (df.groupby(['product',df['sale_date'].dt.year], sort=False)
.agg( total_price=('price','sum'),
total_discount=('discount','sum'),
number_of_sales=('discount','size'))
.reset_index())
df1.insert(2,'number_of_months', df1['sale_date'].map({2016:1, 2017:12, 2018:11}))
print (df1)
product sale_date number_of_months total_price total_discount \
0 A 2016 NaN 50 5
1 A 2017 NaN 250 25
2 B 2016 NaN 200 10
3 B 2017 NaN 1000 110
4 A 2018 NaN 100 18
5 B 2018 NaN 900 130
number_of_sales
0 1
1 5
2 1
3 5
4 2
5 3

Related

Loop Record in Oracle

I have a Table called TaxAmount. It has 3 columns(ID, Year, Amount). refer the below image.
I want to divide each row into 12 months. I attached a sample image below.
I'm new in Oracle side. please help me to write a Oracle Query to display the above result.
I tried ROWNUM. But No luck.
Here's one option:
SQL> select id, year, column_value as month, amount
2 from taxamount cross join
3 table(cast(multiset(select level from dual
4 connect by level <= 12
5 ) as sys.odcinumberlist))
6 order by id, year, month;
ID YEAR MONTH AMOUNT
---------- ---------- ---------- ----------
1 2022 1 100
1 2022 2 100
1 2022 3 100
1 2022 4 100
1 2022 5 100
1 2022 6 100
1 2022 7 100
1 2022 8 100
1 2022 9 100
1 2022 10 100
1 2022 11 100
1 2022 12 100
2 2022 1 200
2 2022 2 200
2 2022 3 200
2 2022 4 200
2 2022 5 200
2 2022 6 200
2 2022 7 200
2 2022 8 200
2 2022 9 200
2 2022 10 200
2 2022 11 200
2 2022 12 200
3 2022 1 150
3 2022 2 150
3 2022 3 150
3 2022 4 150
3 2022 5 150
3 2022 6 150
3 2022 7 150
3 2022 8 150
3 2022 9 150
3 2022 10 150
3 2022 11 150
3 2022 12 150
36 rows selected.
SQL>

Transposing multiple related columns

While transposing single columns is pretty straight forward I need to transpose a large amount of data with 3 sets of , 10+ related columns needed to be transposed.
create table test
(month int,year int,po1 int,po2 int,ro1 int,ro2 int,mo1 int,mo2 int, mo3 int);
insert into test
values
(5,2013,100,20,10,1,3,4,5),(4,2014,200,30,20,2,4,5,6),(6,2015,200,80,30,3,5,6,7) ;
select * FROM test;
gives
month
year
po1
po2
ro1
ro2
mo1
mo2
mo3
5
2013
100
20
10
1
3
4
5
4
2014
200
30
20
2
4
5
6
6
2015
200
80
30
3
5
6
7
Transposing using UNPIVOT
select
month, year,
PO, RO, MO
from ( SELECT * from test) src
unpivot
( PO for Description in (po1, po2))unpiv1
unpivot
(RO for Description1 in (ro1, ro2)) unpiv2
unpivot
(MO for Description2 in (mo1, mo2, mo3)) unpiv3
order by year
Gives me this
month
year
PO
RO
MO
5
2013
100
10
3
5
2013
100
10
4
5
2013
100
10
5
5
2013
100
1
3
5
2013
100
1
4
5
2013
100
1
5
5
2013
20
10
3
5
2013
20
10
4
5
2013
20
10
5
5
2013
20
1
3
5
2013
20
1
4
5
2013
20
1
5
4
2014
200
20
4
4
2014
200
20
5
4
2014
200
20
6
4
2014
200
2
4
4
2014
200
2
5
4
2014
200
2
6
4
2014
30
20
4
4
2014
30
20
5
4
2014
30
20
6
4
2014
30
2
4
4
2014
30
2
5
4
2014
30
2
6
6
2015
200
30
5
6
2015
200
30
6
6
2015
200
30
7
6
2015
200
3
5
6
2015
200
3
6
6
2015
200
3
7
6
2015
80
30
5
6
2015
80
30
6
6
2015
80
30
7
6
2015
80
3
5
6
2015
80
3
6
6
2015
80
3
7
I will like to turn it to something like this. Is that possible?
month
year
PO
RO
MO
5
2013
100
10
3
5
2013
20
1
4
5
2013
0
0
5
4
2014
200
20
4
4
2014
30
2
5
4
2014
0
0
6
6
2015
200
30
5
6
2015
80
3
6
6
2015
0
0
7
Maybe use a query like below which creates rows as per your design using CROSS APPLY
select month,year,po,ro,mo from
test cross apply
(values (po1,ro1,mo1), (po2,ro2,mo2),(0,0,mo3))v(po,ro,mo)
see demo here
Unpivot acts similar as union,Use union all in your case
SELECT month,
year,
po1 AS PO,
ro1 AS RO,
mo1 AS MO
FROM test
UNION ALL
SELECT month,
year,
po2,
ro2,
mo2
FROM test
UNION ALL
SELECT month,
year,
0,
0,
mo2
FROM test

Convert table Columns into a Hierarchy structure and accordingly change other columns count and percentage

I have a table in the below format.
Nhead
Rhead
Ghead
Fhead
Year
Month
Cover_count
Total_Count
Per_count
a
b
c
d
2
2021
20
30
66.66
a
f
g
h
2
2021
40
60
66.66
a
f
g
h
3
2021
80
90
88.88
w
x
y
z
3
2021
10
20
50
I want table output as
Head
Name
year
Month
cover_count
Total_count
Per_Count
Nhead
a
2
2021
60
90
66.66
Rhead
b
2
2021
20
30
66.66
Rhead
f
2
2021
40
60
66.66
Ghead
c
2
2021
20
30
66.66
Ghead
g
2
2021
40
60
66.66
Fhead
d
2
2021
20
30
66.66
Fhead
h
2
2021
40
60
66.66
Nhead
a
3
2021
80
90
88.88
Rhead
f
3
2021
80
90
88.88
Ghead
g
3
2021
80
30
88.88
Fhead
h
3
2021
80
30
88.88
Nhead
w
3
2021
10
20
50
Please help me with this query

Pandas - creating new column based on data from other records

I have a pandas dataframe which has the folowing columns -
Day, Month, Year, City, Temperature.
I would like to have a new column that has the average (mean) temperature in same date (day\month) of all previous years.
Can someone please assist?
Thanks :-)
Try:
dti = pd.date_range('2000-1-1', '2021-12-1', freq='D')
temp = np.random.randint(10, 20, len(dti))
df = pd.DataFrame({'Day': dti.day, 'Month': dti.month, 'Year': dti.year,
'City': 'Nice', 'Temperature': temp})
out = df.set_index('Year').groupby(['City', 'Month', 'Day']) \
.expanding()['Temperature'].mean().reset_index()
Output:
>>> out
Day Month Year City Temperature
0 1 1 2000 Nice 12.000000
1 1 1 2001 Nice 12.000000
2 1 1 2002 Nice 11.333333
3 1 1 2003 Nice 12.250000
4 1 1 2004 Nice 11.800000
... ... ... ... ... ...
8001 31 12 2016 Nice 15.647059
8002 31 12 2017 Nice 15.555556
8003 31 12 2018 Nice 15.631579
8004 31 12 2019 Nice 15.750000
8005 31 12 2020 Nice 15.666667
[8006 rows x 5 columns]
Focus on 1st January of the dataset:
>>> df[df['Day'].eq(1) & df['Month'].eq(1)]
Day Month Year City Temperature # Mean
0 1 1 2000 Nice 12 # 12
366 1 1 2001 Nice 12 # 12
731 1 1 2002 Nice 10 # 11.33
1096 1 1 2003 Nice 15 # 12.25
1461 1 1 2004 Nice 10 # 11.80
1827 1 1 2005 Nice 12 # and so on
2192 1 1 2006 Nice 17
2557 1 1 2007 Nice 16
2922 1 1 2008 Nice 19
3288 1 1 2009 Nice 12
3653 1 1 2010 Nice 10
4018 1 1 2011 Nice 16
4383 1 1 2012 Nice 13
4749 1 1 2013 Nice 15
5114 1 1 2014 Nice 14
5479 1 1 2015 Nice 13
5844 1 1 2016 Nice 15
6210 1 1 2017 Nice 13
6575 1 1 2018 Nice 15
6940 1 1 2019 Nice 18
7305 1 1 2020 Nice 11
7671 1 1 2021 Nice 14

Add new column based on groupby map in pandas [duplicate]

This question already has answers here:
Remap values in pandas column with a dict, preserve NaNs
(11 answers)
Closed 2 years ago.
I have a df as shown below
product bought_date number_of_sales
A 2016 15
A 2017 10
A 2018 15
B 2016 20
B 2017 30
B 2018 20
C 2016 20
C 2017 30
C 2018 20
From the above I would like to add one column called cost_per_unit as shown below.
cost of product A is 100, B is 500 and C is 200
d1 = {'A':100, 'B':500, 'C':'200'}
Expected Output:
product bought_date number_of_sales cost_per_unit
A 2016 15 100
A 2017 10 100
A 2018 15 100
B 2016 20 500
B 2017 30 500
B 2018 20 500
C 2016 20 200
C 2017 30 200
C 2018 20 200
No need for any lambda function. Run just:
df['cost_per_unit'] = df['product'].map(d1)
Additional remark: product is a name of a Pandas function. You should avoid
column names "covering" existing functions or attributes.
It is a good habit, that they should differ, at least in char case.
You can try this:
df['cost_per_unit'] = df.apply(lambda x: d1[x['product']], axis=1)
print(df)
product bought_date number_of_sales cost_per_unit
0 A 2016 15 100
1 A 2017 10 100
2 A 2018 15 100
3 B 2016 20 500
4 B 2017 30 500
5 B 2018 20 500
6 C 2016 20 200
7 C 2017 30 200
8 C 2018 20 200