Count and where conditions leades to perfomance issues? - sql

I am working on a million data rows table.The table look likes below
Departement year Candidate Spent Saved
Electrical 2013 A 50 50
Electrical 2013 B 25 50
Electrical 2013 C 11 50
Electrical 2013 D 25 0
Electrical 2013 Dt 86 50
Electrical 2014 AA 50 50
Electrical 2014 BB 25 0
Electrical 2014 CH 11 50
Electrical 2014 DG 25 0
Electrical 2014 DH 0 50
Computers 2013 Ax 50 50
Computers 2013 Bc 25 50
Computers 2013 Cx 11 50
Computers 2013 Dx 25 0
Computers 2013 Dx 86 50
I am looking output like below.
Departement year NoOfCandidates NoOfCandidatesWith50$save NoOfCandidatesWith0$save
Electrical 2013 5 4 1
Electrical 2014 5 3 2
Computers 2013 5 4 1
I am using #TEMP tables for every count where conditions and left outer joining at last .So it takes me more time.
Is there any way so i can perform better for above Table .
Thanks in advance.

You want to do this as a single aggregation query. There is no need for temporary tables:
select department, year, count(*) as NumCandidates,
sum(case when saved = 50 then 1 else 0 end) as NumCandidatesWith50Save
sum(case when saved = 0 then 1 else 0 end) as NumCandidatesWith00Save
from table t
group by department, year
order by 1, 2;

Related

SQL Query group by Postcode multiple Sums

I have following data:
ID
Weight
Postcode
Year
1
23
56222
2022
2
24
56332
2022
3
50
56442
2022
4
22
62331
2022
5
80
72130
2022
and i want to query it that i get the data like this:
Grouped by Postcode and splitted in different weight ranges.
and then just Count of the amount of entrys.
Postcode/Weight
0-20
21-40
41-60
61-80
81-100
56
0
2
1
0
0
62
0
1
0
0
0
72
0
0
0
1
0
Is there any way to query this in SQL?
Try this one.
Query:
SELECT
p.postcode,
COUNT(p20.id) as "0-20",
COUNT(p40.id) as "21-40",
COUNT(p60.id) as "41-60",
COUNT(p80.id) as "61-80",
COUNT(p100.id) as "81-100"
FROM packs p
LEFT JOIN packs p20 ON p20.postcode=p.postcode AND p20.weight < 20
LEFT JOIN packs p40 ON p40.postcode=p.postcode AND p40.weight >= 21 AND p40.weight <= 40
LEFT JOIN packs p60 ON p60.postcode=p.postcode AND p60.weight >= 41 AND p60.weight <= 60
LEFT JOIN packs p80 ON p80.postcode=p.postcode AND p80.weight >= 61 AND p80.weight <= 80
LEFT JOIN packs p100 ON p100.postcode=p.postcode AND p100.weight >= 81 AND p100.weight <= 100
GROUP by postcode;
Result:
Table

Compare data from for specific column grouping and Update based on criteria

I have a table with the following structure:
Employee Project Task Accomplishment Score Year
John A 1 5 60 2016
John A 1 6 40 2018
John A 2 3 30 2016
Simon B 2 0 30 2017
Simon B 2 4 30 2019
David C 1 3 20 2015
David C 1 2 40 2016
David C 3 0 25 2017
David C 3 5 35 2017
I want to create a view with Oracle SQLout of the above table which looks like as follows:
Employee Project Task Accomplishment Score Year UpdateScore Comment
John A 1 5 60 2016 60
John A 1 6 40 2018 100 (=60+40)
John A 2 3 30 2016 30
Simon B 2 0 30 2017 30
Simon B 2 4 40 2019 40 (no update because Accomplishement was 0)
David C 1 3 20 2015 20
David C 1 2 40 2016 60 (=20+40)
David C 3 0 25 2017 25
David C 3 5 35 2017 35 (no update because Accomplishement was 0)
The Grouping is: Employee-Project-Task.
The Rule of the UpdateScore column:
If for a specific Employee-Project-Task group Accomplishment column value is greater than 0 for the previous year, add the previous year's score to the latest year for the same Employee-Project-Task group.
For example: John-A-1 is a group which is different from John-A-2. So as we can see for John-A-1 the Accomplishment is 5 (which is greater than 0) in 2016, so we add the Score from 2016 with the score of 2018 for the John-A-1 and the updated score becomes 100.
For Simon-B-2, the accomplishment was 0, so there will be no update for 2019 for Simon-B-2.
Note: I don't need the Comment field, it is there just for more clarification.
Use analytic functions to determine if there was a score for the previous year, and if so, add it to the UpdatedScore.
select Employee, Project, Task, Accomplishment, Score, Year,
case when lag(Year) over (partition by Employee, Project order by Year) = Year - 1
then lag(Score) over (partition by Employee, Project order by Year)
else 0
end + Score as UpdatedScore
from EmployeeScore;
This is a bit strange -- you are counting the accomplishment of 0 in one year but not the next. Okay.
Use analytic functions:
select t.*,
(case when lag(accomplishment) over (partition by Employee, Project, Task order by year) > 0
then lag(score) over (partition by Employee, Project, Task order by year)
else 0
end) + score as update_score
from t;
from t

Pandas: Group by two columns to get sum of another column

I look most of the previously asked questions but was not able to find answer for my question:
I have following data.frame
id year month score num_attempts
0 483625 2010 01 50 1
1 967799 2009 03 50 1
2 213473 2005 09 100 1
3 498110 2010 12 60 1
5 187243 2010 01 100 1
6 508311 2005 10 15 1
7 486688 2005 10 50 1
8 212550 2005 10 500 1
10 136701 2005 09 25 1
11 471651 2010 01 50 1
I want to get following data frame
year month sum_score sum_num_attempts
2009 03 50 1
2005 09 125 2
2010 12 60 1
2010 01 200 2
2005 10 565 3
Here is what I tried:
sum_df = df.groupby(by=['year','month'])['score'].sum()
But this doesn't look efficient and correct. If I have more than one column need to be aggregate this seems like a very expensive call. for example if I have another column num_attempts and just want to sum by year month as score.
This should be an efficient way:
sum_df = df.groupby(['year','month']).agg({'score': 'sum', 'num_attempts': 'sum'})

Conditional Logic within SUM

I'm currently combining two tables through a UNION ALL query and performing SUM and GROUP BY operations on the result. Everything is working as expected, but I have a unique requirement which I can't seem to figure out how to implement.
My aim is to write SQL that says "when DEV_AGE column is >= 12 set the REVENUE value to what it would be if this column was 12". I provide the code below as I know this description can be a bit confusing:
REVENUE table:
ACC_YR DEV_AGE STATE REVENUE LOSS
2012 3 MA 4000 0
2012 6 MA 8000 0
2012 9 MA 12000 0
2012 12 MA 16000 0
LOSS table:
ACC_YR DEV_AGE STATE REVENUE LOSS
2012 3 MA 0 2000
2012 6 MA 0 7000
2012 9 MA 0 9000
2012 12 MA 0 10000
2012 15 MA 0 14000
2012 18 MA 0 14000
2012 21 MA 0 14000
2012 24 MA 0 15000
2012 27 MA 0 17000
Table after UNION ALL, GROUP BY, SUM:
ACC_YR DEV_AGE STATE REVENUE LOSS
2012 3 MA 4000 2000
2012 6 MA 8000 7000
2012 9 MA 12000 9000
2012 12 MA 16000 10000
2012 15 MA 0 14000
2012 18 MA 0 14000
2012 21 MA 0 14000
2012 24 MA 0 15000
2012 27 MA 0 17000
What I WANT to accomplish:
ACC_YR DEV_AGE STATE REVENUE LOSS
2012 3 MA 4000 2000
2012 6 MA 8000 7000
2012 9 MA 12000 9000
2012 12 MA 16000 10000
2012 15 MA 16000 14000
2012 18 MA 16000 14000
2012 21 MA 16000 14000
2012 24 MA 16000 15000
2012 27 MA 16000 17000
In other words, my REVENUE stops developing at a DEV_AGE of 12 (there are no rows in the REVENUE table beyond a DEV_AGE of 12), but I want every DEV_AGE beyond 12 to equal what the REVENUE was at 12 in the final table.
Here is an approach that uses window functions to calculate the revenue for age 12 and then logic to assign it:
select acc_yr, dev_age, state,
(case when dev_age > 12 then rev12 else revenue end) as revenue, loss
from (select l.acc_yr, l.dev_age, l.state, r.revenue, l.loss,
max(case when l.dev_age = 12 then r.revenue end) over (partition by l.acc_yr, l.state) as rev12
from loss l left join
revenue r
on l.acc_yr = r.acc_yr and l.dev_age = r.dev_age and l.state = dev.state
) lr;

Normalize a Table That Contains Monthly, Yearly and Quarterly Data

How do I normalize this table:
Frequency (PK) Year (PK) Quarter (PK) Month (PK) Value
Monthly 2013 1 1 1
Quarterly 2013 1 0 2
Yearly 2013 0 0 3
The table is not in 2nd normal form, because when Frequency = Yearly Value depends on a subset of the primary key (Frequency, Year)
I've thougt about adding a surrogate key. Then Quarter and Month columns could be nullable.
Surrogate (PK) Frequency Year Quarter Month Value
1 Monthly 2013 1 1 1
2 Quarterly 2013 1 NULL 2
3 Yearly 2013 NULL NULL 3
But this doesn't solve the problem, because the 2nd normal form definition also applies to candidate keys. Dividing the table into three tables based on Frequency doesn't sound like a good idea, because it will introduce if statemments into my business logic:
if (frequency == Monthly) then select from DataMonthly
I'm going to assume that a couple of year's worth of data might look something like this. Correct me if I'm wrong. (I'm going to ignore the issue of whether using zeroes is a good idea or a bad idea.)
Frequency Year Quarter Month Value
--
Monthly 2012 1 1 1
Monthly 2012 1 2 2
Monthly 2012 1 3 3
Monthly 2012 2 4 4
Monthly 2012 2 5 5
Monthly 2012 2 6 6
Monthly 2012 3 7 7
Monthly 2012 3 8 8
Monthly 2012 3 9 9
Monthly 2012 4 10 10
Monthly 2012 4 11 11
Monthly 2012 4 12 12
Quarterly 2012 1 0 2
Quarterly 2012 2 0 5
Quarterly 2012 3 0 8
Quarterly 2012 4 0 11
Yearly 2012 0 0 3
Monthly 2013 1 1 1
Monthly 2013 1 2 2
Monthly 2013 1 3 3
Monthly 2013 2 4 4
Monthly 2013 2 5 5
Monthly 2013 2 6 6
Monthly 2013 3 7 7
Monthly 2013 3 8 8
Monthly 2013 3 9 9
Monthly 2013 4 10 10
Monthly 2013 4 11 11
Monthly 2013 4 12 12
Quarterly 2013 1 0 2
Quarterly 2013 2 0 5
Quarterly 2013 3 0 8
Quarterly 2013 4 0 11
Yearly 2013 0 0 3
From that data we can deduce two functional dependencies. A functional dependency answers the question, "Given one value for the set of attributes 'X', do we know one and only one value for the set of attributes 'Y'?"
{Year, Quarter, Month}->Frequency
{Year, Quarter, Month}->Value
Given one value for the set of attributes {Year, Quarter, Month}, we know one and only one value for the set of attributes {Frequency}. And given one value for the set of attributes {Year, Quarter, Month}, we know one and only one value for the set of attributes {Value}.
The problem you were running into involved including "Frequency" as part of the primary key. It's really not.
This table could do probably without the [Frequency] and [Quarter] column.
Why do you want to have these in? Is there any added value in having the Quarterly and Yearly values precalculated in this table? Comment: Since it's Value's are not just the sum of it's Month's.
So [Quarter] is mandatory.
This will work too:
Year (PK) Quarter (PK) Month (PK) Value
2013 1 1 1
2013 1 0 2
2013 0 0 3
Yearly results:
SELECT
[Value]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 0 AND [Month] = 0
Quarterly results:
SELECT
[Value]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 1 AND [Month] = 0
Monthly results:
SELECT
[Value] AS [Results]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 1 AND [Month] = 1
Would this work for you?