how to pivot dataframe to yield results by department and sum value - pandas

Is there a way to pivot following dataframe to get the results by departments and total opened days with sum.
Department 2012 2013 2014
0 Electronics 0 270 365
1 Electronics 0 0 0
2 Grocery 242 365 365
3 Grocery 241 365 365
Expected:
Department Year Total
0 Electronics 2012 0
1 Electronics 2013 270
2 Electronics 2014 365
2 Grocery 2012 483
2 Grocery 2013 730
2 Grocery 2014 730

We can do groupby with sum then stack
s=df.groupby('Department').sum().stack().to_frame('Total').reset_index()

Let's melt the data, groupby on Department and Year and sum to get our total :
(df.melt("Department",
var_name="Year",
value_name="Total")
.groupby(["Department","Year"])
.sum()
)
Total
Department Year
Electronics 2012 0
2013 270
2014 365
Grocery 2012 483
2013 730
2014 730

Related

Percentage by year

I have this dataset
Year score count
2007 20 grade 2000
2006 20 2385
2006 20 grade 10
2006 20 grade_N 3
2005 40 grade 428
2006 40 grade 815
2006 40 grade_1 15
2006 40 grade 3
...
Generated by
SEL years,
Score,
,count(0)
,100.0*count(0)/sum(count(*)) over () as pct
From table1
Group by 1,2
If I add a condition
Where years =2006 it gives me the right percentage
2006 20 73.8
2006 20 grade 0.0
...
But if I do not specify it, it returns lower number.
How can I determine percentage by year?
Try this.
sum(count(*)) over (partition by YEAR)

Add column value to next column in SQL

My sql table is
Week Year Applications
1 2017 0
2 2017 10
3 2017 20
4 2017 50
5 2017 0
1 2018 10
2 2018 0
3 2018 40
4 2018 50
5 2018 10
And I want SQL query which give below output
Week Year Applications
1 2017 0
2 2017 10
3 2017 30
4 2017 80
5 2017 80
1 2018 10
2 2018 10
3 2018 50
4 2018 100
5 2018 110
Can anyone help me to write below query?
You could use SUM() OVER to get cumulative sum:
SELECT *, SUM(Applications) OVER(PARTITION BY Year ORDER BY Week)
FROM tab
It looks like you want a cumulative sum:
select week, year,
sum(applications) over (partition by year order by week) as cumulative_applications
from t;

Pandas: Group by two columns to get sum of another column

I look most of the previously asked questions but was not able to find answer for my question:
I have following data.frame
id year month score num_attempts
0 483625 2010 01 50 1
1 967799 2009 03 50 1
2 213473 2005 09 100 1
3 498110 2010 12 60 1
5 187243 2010 01 100 1
6 508311 2005 10 15 1
7 486688 2005 10 50 1
8 212550 2005 10 500 1
10 136701 2005 09 25 1
11 471651 2010 01 50 1
I want to get following data frame
year month sum_score sum_num_attempts
2009 03 50 1
2005 09 125 2
2010 12 60 1
2010 01 200 2
2005 10 565 3
Here is what I tried:
sum_df = df.groupby(by=['year','month'])['score'].sum()
But this doesn't look efficient and correct. If I have more than one column need to be aggregate this seems like a very expensive call. for example if I have another column num_attempts and just want to sum by year month as score.
This should be an efficient way:
sum_df = df.groupby(['year','month']).agg({'score': 'sum', 'num_attempts': 'sum'})

Conditional Logic within SUM

I'm currently combining two tables through a UNION ALL query and performing SUM and GROUP BY operations on the result. Everything is working as expected, but I have a unique requirement which I can't seem to figure out how to implement.
My aim is to write SQL that says "when DEV_AGE column is >= 12 set the REVENUE value to what it would be if this column was 12". I provide the code below as I know this description can be a bit confusing:
REVENUE table:
ACC_YR DEV_AGE STATE REVENUE LOSS
2012 3 MA 4000 0
2012 6 MA 8000 0
2012 9 MA 12000 0
2012 12 MA 16000 0
LOSS table:
ACC_YR DEV_AGE STATE REVENUE LOSS
2012 3 MA 0 2000
2012 6 MA 0 7000
2012 9 MA 0 9000
2012 12 MA 0 10000
2012 15 MA 0 14000
2012 18 MA 0 14000
2012 21 MA 0 14000
2012 24 MA 0 15000
2012 27 MA 0 17000
Table after UNION ALL, GROUP BY, SUM:
ACC_YR DEV_AGE STATE REVENUE LOSS
2012 3 MA 4000 2000
2012 6 MA 8000 7000
2012 9 MA 12000 9000
2012 12 MA 16000 10000
2012 15 MA 0 14000
2012 18 MA 0 14000
2012 21 MA 0 14000
2012 24 MA 0 15000
2012 27 MA 0 17000
What I WANT to accomplish:
ACC_YR DEV_AGE STATE REVENUE LOSS
2012 3 MA 4000 2000
2012 6 MA 8000 7000
2012 9 MA 12000 9000
2012 12 MA 16000 10000
2012 15 MA 16000 14000
2012 18 MA 16000 14000
2012 21 MA 16000 14000
2012 24 MA 16000 15000
2012 27 MA 16000 17000
In other words, my REVENUE stops developing at a DEV_AGE of 12 (there are no rows in the REVENUE table beyond a DEV_AGE of 12), but I want every DEV_AGE beyond 12 to equal what the REVENUE was at 12 in the final table.
Here is an approach that uses window functions to calculate the revenue for age 12 and then logic to assign it:
select acc_yr, dev_age, state,
(case when dev_age > 12 then rev12 else revenue end) as revenue, loss
from (select l.acc_yr, l.dev_age, l.state, r.revenue, l.loss,
max(case when l.dev_age = 12 then r.revenue end) over (partition by l.acc_yr, l.state) as rev12
from loss l left join
revenue r
on l.acc_yr = r.acc_yr and l.dev_age = r.dev_age and l.state = dev.state
) lr;

Count and where conditions leades to perfomance issues?

I am working on a million data rows table.The table look likes below
Departement year Candidate Spent Saved
Electrical 2013 A 50 50
Electrical 2013 B 25 50
Electrical 2013 C 11 50
Electrical 2013 D 25 0
Electrical 2013 Dt 86 50
Electrical 2014 AA 50 50
Electrical 2014 BB 25 0
Electrical 2014 CH 11 50
Electrical 2014 DG 25 0
Electrical 2014 DH 0 50
Computers 2013 Ax 50 50
Computers 2013 Bc 25 50
Computers 2013 Cx 11 50
Computers 2013 Dx 25 0
Computers 2013 Dx 86 50
I am looking output like below.
Departement year NoOfCandidates NoOfCandidatesWith50$save NoOfCandidatesWith0$save
Electrical 2013 5 4 1
Electrical 2014 5 3 2
Computers 2013 5 4 1
I am using #TEMP tables for every count where conditions and left outer joining at last .So it takes me more time.
Is there any way so i can perform better for above Table .
Thanks in advance.
You want to do this as a single aggregation query. There is no need for temporary tables:
select department, year, count(*) as NumCandidates,
sum(case when saved = 50 then 1 else 0 end) as NumCandidatesWith50Save
sum(case when saved = 0 then 1 else 0 end) as NumCandidatesWith00Save
from table t
group by department, year
order by 1, 2;