How to take average of every 3 row in a particular column? - pandas

I have a dataframe like this
Year Month ProductCategory Sales(In ThousandDollars)
0 2009 1 WomenClothing 1755.0
1 2009 1 MenClothing 524.0
2 2009 1 OtherClothing 936.0
3 2009 2 WomenClothing 1729.0
4 2009 2 MenClothing 496.0
5 2009 2 OtherClothing 859.0
6 2009 3 WomenClothing 2256.0
7 2009 3 MenClothing 542.0
8 2009 3 OtherClothing 921.0
9 2009 4 WomenClothing 2662.0
10 2009 4 MenClothing 669.0
11 2009 4 OtherClothing 914.0
12 2009 5 WomenClothing 2732.0
13 2009 5 MenClothing 650.0
14 2009 5 OtherClothing 989.0
15 2009 6 WomenClothing 2220.0
16 2009 6 MenClothing 607.0
17 2009 6 OtherClothing 932.0
18 2009 7 WomenClothing 2164.0
19 2009 7 MenClothing 575.0
20 2009 7 OtherClothing 901.0
21 2009 8 WomenClothing 2371.0
22 2009 8 MenClothing 551.0
23 2009 8 OtherClothing 865.0
24 2009 9 WomenClothing 2421.0
25 2009 9 MenClothing 579.0
26 2009 9 OtherClothing 819.0
27 2009 10 WomenClothing 2579.0
28 2009 10 MenClothing 610.0
29 2009 10 OtherClothing 914.0
Every month of a year has 3 different product categories (WomenClothing, MenClothing, OtherClothing), so to represent that we have 3 rows for each month. I want to take average of Sales column for every month, i.e. average of every 3 rows and take that as one value for every month, so that I can reduce the number of rows.
That is, at the end, I just want to have one row for every month in a year.
Just like this:
Year Month Average Sale of each month
0 2009 1 1071.66
3 2009 2 1028.0
6 2009 3 1239.66
10 2009 4 1415.0

You can use:
df.groupby(['Year','Month'])['Sales(In ThousandDollars)'].mean().reset_index()
Year Month Sales(In ThousandDollars)
0 2009 1 1071.666667
1 2009 2 1028.000000
2 2009 3 1239.666667
3 2009 4 1415.000000
4 2009 5 1457.000000
5 2009 6 1253.000000
6 2009 7 1213.333333
7 2009 8 1262.333333
8 2009 9 1273.000000
9 2009 10 1367.666667

You can utilize the index for your grouping. It would look something like this:
df.groupby(df.index // 3).mean()
If your month column is consistent that you will always have 3 rows for each month in a year, you can groupby year and month to get the same result.
This gives you:
Year Month Sales
0 2009 1 1071.666667
1 2009 2 1028.000000
2 2009 3 1239.666667
3 2009 4 1415.000000
4 2009 5 1457.000000
5 2009 6 1253.000000
6 2009 7 1213.333333
7 2009 8 1262.333333
8 2009 9 1273.000000
9 2009 10 1367.666667

Related

Transposing multiple related columns

While transposing single columns is pretty straight forward I need to transpose a large amount of data with 3 sets of , 10+ related columns needed to be transposed.
create table test
(month int,year int,po1 int,po2 int,ro1 int,ro2 int,mo1 int,mo2 int, mo3 int);
insert into test
values
(5,2013,100,20,10,1,3,4,5),(4,2014,200,30,20,2,4,5,6),(6,2015,200,80,30,3,5,6,7) ;
select * FROM test;
gives
month
year
po1
po2
ro1
ro2
mo1
mo2
mo3
5
2013
100
20
10
1
3
4
5
4
2014
200
30
20
2
4
5
6
6
2015
200
80
30
3
5
6
7
Transposing using UNPIVOT
select
month, year,
PO, RO, MO
from ( SELECT * from test) src
unpivot
( PO for Description in (po1, po2))unpiv1
unpivot
(RO for Description1 in (ro1, ro2)) unpiv2
unpivot
(MO for Description2 in (mo1, mo2, mo3)) unpiv3
order by year
Gives me this
month
year
PO
RO
MO
5
2013
100
10
3
5
2013
100
10
4
5
2013
100
10
5
5
2013
100
1
3
5
2013
100
1
4
5
2013
100
1
5
5
2013
20
10
3
5
2013
20
10
4
5
2013
20
10
5
5
2013
20
1
3
5
2013
20
1
4
5
2013
20
1
5
4
2014
200
20
4
4
2014
200
20
5
4
2014
200
20
6
4
2014
200
2
4
4
2014
200
2
5
4
2014
200
2
6
4
2014
30
20
4
4
2014
30
20
5
4
2014
30
20
6
4
2014
30
2
4
4
2014
30
2
5
4
2014
30
2
6
6
2015
200
30
5
6
2015
200
30
6
6
2015
200
30
7
6
2015
200
3
5
6
2015
200
3
6
6
2015
200
3
7
6
2015
80
30
5
6
2015
80
30
6
6
2015
80
30
7
6
2015
80
3
5
6
2015
80
3
6
6
2015
80
3
7
I will like to turn it to something like this. Is that possible?
month
year
PO
RO
MO
5
2013
100
10
3
5
2013
20
1
4
5
2013
0
0
5
4
2014
200
20
4
4
2014
30
2
5
4
2014
0
0
6
6
2015
200
30
5
6
2015
80
3
6
6
2015
0
0
7
Maybe use a query like below which creates rows as per your design using CROSS APPLY
select month,year,po,ro,mo from
test cross apply
(values (po1,ro1,mo1), (po2,ro2,mo2),(0,0,mo3))v(po,ro,mo)
see demo here
Unpivot acts similar as union,Use union all in your case
SELECT month,
year,
po1 AS PO,
ro1 AS RO,
mo1 AS MO
FROM test
UNION ALL
SELECT month,
year,
po2,
ro2,
mo2
FROM test
UNION ALL
SELECT month,
year,
0,
0,
mo2
FROM test

Pandas - creating new column based on data from other records

I have a pandas dataframe which has the folowing columns -
Day, Month, Year, City, Temperature.
I would like to have a new column that has the average (mean) temperature in same date (day\month) of all previous years.
Can someone please assist?
Thanks :-)
Try:
dti = pd.date_range('2000-1-1', '2021-12-1', freq='D')
temp = np.random.randint(10, 20, len(dti))
df = pd.DataFrame({'Day': dti.day, 'Month': dti.month, 'Year': dti.year,
'City': 'Nice', 'Temperature': temp})
out = df.set_index('Year').groupby(['City', 'Month', 'Day']) \
.expanding()['Temperature'].mean().reset_index()
Output:
>>> out
Day Month Year City Temperature
0 1 1 2000 Nice 12.000000
1 1 1 2001 Nice 12.000000
2 1 1 2002 Nice 11.333333
3 1 1 2003 Nice 12.250000
4 1 1 2004 Nice 11.800000
... ... ... ... ... ...
8001 31 12 2016 Nice 15.647059
8002 31 12 2017 Nice 15.555556
8003 31 12 2018 Nice 15.631579
8004 31 12 2019 Nice 15.750000
8005 31 12 2020 Nice 15.666667
[8006 rows x 5 columns]
Focus on 1st January of the dataset:
>>> df[df['Day'].eq(1) & df['Month'].eq(1)]
Day Month Year City Temperature # Mean
0 1 1 2000 Nice 12 # 12
366 1 1 2001 Nice 12 # 12
731 1 1 2002 Nice 10 # 11.33
1096 1 1 2003 Nice 15 # 12.25
1461 1 1 2004 Nice 10 # 11.80
1827 1 1 2005 Nice 12 # and so on
2192 1 1 2006 Nice 17
2557 1 1 2007 Nice 16
2922 1 1 2008 Nice 19
3288 1 1 2009 Nice 12
3653 1 1 2010 Nice 10
4018 1 1 2011 Nice 16
4383 1 1 2012 Nice 13
4749 1 1 2013 Nice 15
5114 1 1 2014 Nice 14
5479 1 1 2015 Nice 13
5844 1 1 2016 Nice 15
6210 1 1 2017 Nice 13
6575 1 1 2018 Nice 15
6940 1 1 2019 Nice 18
7305 1 1 2020 Nice 11
7671 1 1 2021 Nice 14

Custom SQL for quarter count starting from previous month

I need to create a custom quarter calculator to start always from previous month no matter month, year we are at and count back to get quarter. Previous year wuarters are to be numbered 5, 6 etc
So the goal is to move quarter grouping one month back.
Assume we run query on December 11th, result should be:
YEAR MNTH QTR QTR_ALT
2017 1 1 12
2017 2 1 12
2017 3 1 11
2017 4 2 11
2017 5 2 11
2017 6 2 10
2017 7 3 10
2017 8 3 10
2017 9 3 9
2017 10 4 9
2017 11 4 9
2017 12 4 8
2018 1 1 8
2018 2 1 8
2018 3 1 7
2018 4 2 7
2018 5 2 7
2018 6 2 6
2018 7 3 6
2018 8 3 6
2018 9 3 5
2018 10 4 5
2018 11 4 5
2018 12 4 1
2019 1 1 1
2019 2 1 1
2019 3 1 2
2019 4 2 2
2019 5 2 2
2019 6 2 3
2019 7 3 3
2019 8 3 3
2019 9 3 4
2019 10 4 4
2019 11 4 4
2019 12 4 THIS IS SKIPPED
Starting point is eliminating current_date so data end at previous month's last day
SELECT DISTINCT
YEAR,
MNTH,
QTR
FROM TABLE
WHERE DATA BETWEEN
(SELECT DATE_TRUNC(YEAR,ADD_MONTHS(CURRENT_DATE, -24))) AND
(SELECT DATE_TRUNC(MONTH,CURRENT_DATE)-1)
ORDER BY YEAR, MNTH, QTR
The following gets you all the dates you need, with the extra columns.
select to_char(add_months(a.dt, -b.y), 'YYYY') as year,
to_char(add_months(a.dt, -b.y), 'MM') as month,
ceil(to_number(to_char(add_months(a.dt, -b.y), 'MM')) / 3) as qtr,
ceil(b.y/3) as alt_qtr
from
(select trunc(sysdate, 'MONTH') as dt from dual) a,
(select rownum as y from dual connect by level <= 24) b;

how to avoid null while doing diff using period in pandas?

I have the below dataframe and I am calculating the different with the previous value using diff periods but that makes the first value as Null, is there any way to fill that value?
example:
df['cal_val'] = df.groupby('year')['val'].diff(periods=1)
current output:
date year val cal_val
1/3/10 2010 12 NaN
1/6/10 2010 15 3
1/9/10 2010 18 3
1/12/10 2010 20 2
1/3/11 2011 10 NaN
1/6/11 2011 12 2
1/9/11 2011 15 3
1/12/11 2011 18 3
expected output:
date year val cal_val
1/3/10 2010 12 12
1/6/10 2010 15 3
1/9/10 2010 18 3
1/12/10 2010 20 2
1/3/11 2011 10 10
1/6/11 2011 12 2
1/9/11 2011 15 3
1/12/11 2011 18 3

Pandas: how to have preview of several rows sorted under certain columns

If i have a Data Frame(df) as :
Year Rate
2001 10
2001 3
2001 5
2001 3
2001 6
2002 2
2002 7
2002 4
2002 9
2002 8
... ...
2018 8
2018 6
2018 4
2018 6
2018 5
How do i get a Data Frame that show only first 2 rows of each years, like:
Year Rate
2001 10
2001 3
2002 2
2002 7
... ...
2018 8
2018 6
Thanks
Use GroupBy.head:
df1 = df.groupby('Year').head(2)
print (df1)
Year Rate
0 2001 10
1 2001 3
5 2002 2
6 2002 7
10 2018 8
11 2018 6