Query With Multiple Subqueries Efficiency? - sql

I have data that looks like this:
State Sex
---- ---
GA M
GA M
GA F
GA F
GA F
NY M
NY M
NY M
NY M
NY F
NY F
NY F
NY F
NY F
I want the result to be:
one row per state
col1 State
col2 count of Males
col3 count of Females
col4 total count by state
col 5 percent Male by state
The query I am using is:
select t.state State,
M.count Male,
F.count Female,
count(t.state) Total,
CONCAT(ROUND(CAST(M.count as float)/CAST(count(t.state) as float)*100, 2), '%') as calc
from MyTable t
join
(
select state, count(sex) as count
from MyTable where sex ='M'
group by state) M
on t.state = M.state
join (
select state, count(sex) as count
from MyTable where sex ='F'
group by state) F
ON M.state = F.state
group by t.state, m.count, F.count;
The above query works but I am wondering if I did this in the most effecent way.
This was done using SQLServer but I think this should be the same for all RDBMS.
The link is here: http://sqlfiddle.com/#!18/7a969/87

Use conditional aggregation:
select t.state,
sum(case when sex = 'M' then 1 else 0 end) as males,
sum(case when sex = 'F' then 1 else 0 end) as females,
count(*) as total,
avg(case when sex = 'M' then 1.0 else 0 end) as male_ratio
from MyTable t
group by t.state;
I would expect this to be the fastest method in just about any database.
Here is a SQL Fiddle.

You can compute the number of females by subtracting the number of males from the total count per state. This way, only a single join is required:
with r as (select t.state s, count(*) c from testtable t group by t.state)
select r1.s, t1.m males, r1.c - t1.m females, r1.c total, 100*(t1.m/r1.c) m_percent
from r r1
join (select t.state s, t.sex, count(*) m from testtable t group by t.state, t.sex) t1 on r1.s = t1.s where t1.sex = "M";
Output:
state
males
females
total
m_percent
GA
2
3
5
40.0000
NY
4
5
9
44.4444
See demo.

It is not necessary to separate the data out into the tables for Male and Female.
From a performance point of view, it might help improve performance if the sub-queries were able to make optimal use of indexes, but in reality the chances are low that you would be agregating over only indexed values.
For this simple query you could use simple CASE expressions inline to express the Male/Female columns as BIT values, then we can SUM those values in a single aggregation, however that would require you to define the CASE for Male twice, so you could use it in the Male column and the % Male.
Instead of inline CASE we can use CROSS APPLY as a way to resolve calculations against each row once, and allow you to reference the result:
SELECT t.state State,
SUM(Calcs.IsMale) Male,
SUM(Calcs.IsFemale) Female,
COUNT(1) Total,
CONCAT(ROUND(SUM(Calcs.IsMale)/CAST(COUNT(1) as float)*100, 2), '%') as Calc
FROM MyTable t
CROSS APPLY (SELECT
CASE Sex WHEN 'M' THEN 1 END as [IsMale]
,CASE Sex WHEN 'F' THEN 1 END as [IsFemale]
) as calcs
GROUP BY [State]
Is this any more efficient though? In general it should be, this execution plan is much simpler than joining multiple aggregated sets, but its hard to say without a much larger dataset to test it against.
Either way, I would expect this simple CROSS apply version to win as we only have to process the resultset once.
When running the original and the CROSS APPLY on the given dataset and look at the actual execution plans, SQL Sever reports the CROSS APPLY query to be 25% of the relative cost for the batch:
I apologize in advance for posting this as an image, not really sure if there is a better way to have this discussion
This execution plan reports that the Original Query is 3 times the cost of the CROSS APPLY version, probably due to the 3 table scans in the first query, compared to the single table scan in the CROSS APPLY version.

Related

I need to do a query from 2 tables using count function

The query contains 4 columns: the full name of the doctor, the number of male patients, the number of female patients, and the total number of patients seen by that doctor.
My problem is that I dont know how to count the number of males and females
I am only suppoused to use COUNT, GROUP BY and basic DML (cant use case when)
data in the table PACIENTE
er diagram
data in table medico
This depends on which database you are using specifically. One possible way to write this is:
SELECT
doc_name,
COUNT(CASE WHEN PAT_SEX = 'M' THEN 1 END) males,
COUNT(CASE WHEN PAT_SEX = 'F' THEN 1 END) females
FROM
...
Another common syntax for this is:
COUNT(IF PAT_SEX = 'M' THEN 1 ENDIF)
Some databases support this directly:
COUNTIF(PAT_SEX = 'M')
If you would really like to avoid any kind of conditional, then you could add gender to your groups but then you will have two rows for each doctor:
SELECT
doc_name,
pat_sex,
count(*)
FROM
...
GROUP BY
doc_name,
pat_sex

Select only values from one column in a table in SQL with condition

Is it possible to get only the countrys who just played in the pre round
Country Round
Germany Pre Round
Germany Quater final
Spain Pre Round
Portugal Pre Round
And I just want to get the countrys which only played in the pre round. So the result should look like this:
Country
Spain
Portugal
You can group by country and set the conditions in the having clause:
select country
from tablename
group by country
having count(*) = 1 and max(round) = 'Pre Round'
You can try the below using not exists
select country from c
where not exists
(select 1 from c as c1 where c.country=c1.country and roundval<>'Pre Round')
Two more for fun. The first is kind of a variation on #forpas', assigning a numeric value to each round, representing the progression through the rounds, and then getting the highest for the country (which would be simpler if the rounds were stored separately with a round number):
select country
from your_table
group by country
having max(case round
when 'Pre Round' then 1
when 'Quater final' then 2
when 'Semi final' then 3
when 'Final' then 4
end) = 1;
If you wanted to find countries that were in the quarters but not semis then you just need to change to = 2, etc.
The second is overkill here, but could be useful to look for more complicated combinations in other types of data:
select country
from your_table
pivot (
count(*) for round in (
'Pre Round' as pre, 'Quater final' as quarter, 'Semi final' as semi, 'Final' as final
)
)
where pre = 1 and quarter = 0 and semi = 0 and final = 0;
Obviously in your example you wouldn't ever have quarter as 0 and then either semi or final as 1 - you can't get to those rounds without playing the quarters; but for other data you might want a mix.
You could use a inner join on subquery for country wih round 're Round' and check for distinct count
select m.Contry
from my_table m
inner join (
select Country
from my_table
where round ='Pre Round'
) t on t.country = m.country
group by m.Country
having count(distinct m.round ) = 1

SQL Query to create a report of matches played

Recently I have been asked a below SQL query question? Can someone help me with this one. I have a table with three columns - Home Team, Away Team, and Winner Team. Like given below.
H_T A_T W_T
AUS IND IND
ENG AUS ENG
IND AUS AUS
AUS ENG AUS
ENG IND IND
IND ENG IND
Above data needs to be converted in SQL to show the report with below attributes
Team Name, Total Matches Played, Win Count, Draw Count, Loss Count, Points.
To calculate points, these are the formulas for each kind (win/draw/loss)
Win = Win Count * 3
Draw = Draw Count * 1
Loss = Loss Count * 0
Point will be summed up with above three values.
Thanks in advance
Need a little more information as to what defines a draw, this assumes the w_t column contains the value draw instead of a team.
Either way, you can use conditional aggregation to get your desired results. Normally you would have a teams table and join to it, but you can create that with a union and a subquery:
select t.*,
(win_count * 3) + (draw_count) as Points
from (
select t.team,
count(*) Total_Matches_Played,
count(case when t.team = y.w_t then 1 end) Win_Count,
count(case when t.team <> y.w_t then 1 end) Loss_Count,
count(case when y.w_t = 'DRAW' then 1 end) Draw_Count
from (
select h_t as team from yourtable
union select a_t from yourtable
) t join yourtable y on t.team in (y.a_t,y.h_t)
group by t.team
) t

Selecting count by row combinations

I'm strugling with what on the first sight appeared to be simple SQL query :)
So I have following table which has three columns: PlayerId, Gender, Result (all of type integer).
What I'm trying to do, is to select distinct players of gender 2 (male) with number of each results.
There are about 50 possible results, so new table should have 51 columns:
|PlayerId | 1 | 2 | 3 | ... | 50 |
So I would like to see how many times each individual male (gender 2) player got specific result.
*** In case question is still not entirely clear to you: After each game I insert a row with a player ID, gender and result (from 1 - 50) player achieved in that game. Now I'd like to see how many times each player achieved specfic results.
If there are 50 results and you want them in columns, then you are talking about a pivot. I tend to do these with conditional aggregation:
select player,
sum(case when result = 0 then 1 else 0 end) as result_00,
sum(case when result = 1 then 1 else 0 end) as result_01,
. . .
sum(case when result = 50 then 1 else 0 end) as result_50
from t
group by player;
You can choose a particular gender if you like, with where gender = 2. But why not calculate all at the same time?
try
select player, result, count(*)
from your_table
where Gender = 2
group by player, result;
select PleyerId from tablename where result = 'specific result you want' and gender = 2 group by PleyerId
The easiest way is to use pivoting:
;with cte as(Select * from t
Where gender = 2)
Select * from cte
Pivot(count(gender) for result in([1],[2],[3],....,[50]))p
Fiddle http://sqlfiddle.com/#!3/8dad5/3
One note: keeping gender in scores table is a bad idea. Better make a separate table for players and keep gender there.

sql combining 2 queries with different order by group by

I have a query where I am counting the most frequent response in a database and ranking them by highest amount so using group by and order by.
The following shows how to do it for one:
select health, count(health) as count
from [Health].[Questionaire]
group by Health
order by count(Health) desc
which outputs the following:
Health Count
----------- -----
Very Good 6
Good 5
Poor 4
I would like to do with another column on the same table another query similar to the following so two queries using one sql statement like the following:
Health Count Diet Count
----------- ----- ----- -----
Very Good 6 Very Good 6
Good 5 Good 4
Poor 4 Poor 3
UPDATE!!
Hello this is how the table looks like at the moment
ID Diet Health
----------- ----- -------
101 Very Good Very Good
102 Poor Good
103 Poor Poor
I would like to do with another column on the same table another query similar to the following so two queries using one sql statement like the following:
Health Count Diet Count
----------- ----- ----- -----
Very Good 2 Very Good 1
Poor 1 Good 1
Good 0 Poor 1
Can anyone please help me out with this one?
Can provide further clarification if needed!
Here are 2 different ways of doing it, notice i removed the redundant column:
Test data:
DECLARE #t table(Health varchar(20), Diet varchar(20))
INSERT #t values
('Very good', 'Very good'),
('Poor', 'Good'),
('Poor', 'Poor')
Query 1:
;WITH CTE1 as
(
SELECT Health, count(*) CountHealth
FROM #t --[Health].[Questionaire]
GROUP BY health
), CTE2 as
(
SELECT Diet, count(*) CountDiet
FROM #t --[Health].[Questionaire]
GROUP BY Diet
)
SELECT
coalesce(Health, Diet) Grade,
coalesce(CountHealth, 0) CountHealth,
coalesce(CountDiet, 0) CountDiet
FROM CTE1
FULL JOIN
CTE2
ON CTE1.Health = CTE2.Diet
ORDER BY CountHealth DESC
Result 1:
Grade CountHealth CountDiet
Poor 2 1
Very good 1 1
Good 0 1
Mixing the results like that is really not good practice, so here is a different solution
Query 2:
SELECT Health, count(*) Count, 'Health' Grade
FROM #t --[Health].[Questionaire]
GROUP BY health
UNION ALL
SELECT Diet, count(*) CountDiet, 'Diet'
FROM #t --[Health].[Questionaire]
GROUP BY Diet
ORDER BY Grade, Count DESC
Result 2:
Health Count Grade
Good 1 Diet
Poor 1 Diet
Very good 1 Diet
Poor 2 Health
Very good 1 Health
You need to join the table to itself, but (as your sample data shows) to deal with gaps in actual data for specific values.
If you have a table that has the range of health/diet values:
select
v.value Status,
count(a.id) healthCount,
count(b.id) DietCount
from health_diet_values v
left join Questionaire a on a.health = v.value
left join Questionaire b on b.diet = v.value
group by v.value
or if you don't have such a table, you need to generate the list of values manually and join from that:
select
v.value Status,
count(a.id) healthCount,
count(b.id) DietCount
from (select 'Very Good' value union all
select 'Good' union all
select 'Poor') v
left join Questionaire a on a.health = v.value
left join Questionaire b on b.diet = v.value
group by v.value
Both of these queries produce zeroes if there is no matching data for the value.
Note that in your desired output you have a redundant column - you repeat the value column. The above queries produce output that looks like:
Status HealthCount DietCount
-------------------------------
Very Good 2 1
Good 1 1
Poor 0 1