sqlite3: COUNT & EXCEPT not working as expected - sql

I'm fairly new to SQL but having searched the internet for an answer to this I still cannot get my COUNT and EXCEPT statements to select what I want.
My Database:
sqlite> CREATE TABLE Football(Team TEXT, Player TEXT, Age INTEGER, primary key(Team, Player));
sqlite> .separator ,
sqlite> .import databaseTest Football
sqlite> .headers on
sqlite> .mode col
sqlite> SELECT Team, Player, Age FROM Football ORDER BY Team;
Team Player Age
---------- ---------- ----------
Arsenal Cech 38
Arsenal Giroud 29
Arsenal Sanchez 28
Arsenal Walcott 27
Chelsea Costa 29
Chelsea Courtois 25
Chelsea Hazard 26
Chelsea Willian 26
Liverpool Can 23
Liverpool Coutinho 24
Liverpool Wjinaldum 25
Liverpool Woodburn 17
Manchester Aguero 29
Manchester Jesus 19
Manchester Silva 28
Manchester Toure 34
Manchester De Gea 26
Manchester Felliani 29
Manchester Rooney 32
Manchester Schweinste 35
Tottenham Delle Ali 22
Tottenham Kane 24
Tottenham Rose 24
Tottenham Vertonghen 27
What I want to do is SELECT the COUNT of teams that do not have a player over the age of 30. So the select statement should be 3 (Chelsea, Liverpool, Tottenham).
This is the statement I've tried and assumed would work:
sqlite> SELECT COUNT(DISTINCT Team) FROM Football
...> EXCEPT
...> SELECT COUNT(DISTINCT Team) FROM Football WHERE Age > 30;
COUNT(DISTINCT Team)
--------------------
6
But as you can see it returns '6'. What am I doing wrong and how can I get the correct result?

Here is another way. Look at the maximum age for each team:
SELECT COUNT(*)
FROM (SELECT Team
FROM Football
GROUP BY Team
HAVING MAX(Age) <= 30
) t;
You can also use EXCEPT, but this also requires a subquery. You need to do the set operation before doing the count:
SELECT COUNT(DISTINCT TEAM)
FROM (SELECT Team FROM Football
EXCEPT
SELECT Team FROM Football WHERE Age > 30
) t;
Strictly speaking, this query could use COUNT(*) rather than COUNT(DISTINCT). However, it can be troublesome to remember that EXCEPT (like UNION) removes duplicate values.

Related

Adding rows in a table from data that is not in a column

I'm trying to create a table to add all Medals won by the participant countries in the Olympics.
I scraped the data from Wikipedia and have something similar to this:
Year
Country_Name
Host_city
Host_Country
Gold
Silver
Bronze
1986
146
Los Angeles
United States
41
32
30
1986
67
Los Angeles
United States
12
12
12
And so on
I double-checked the data for some years, and it seems very accurate. The Country_Name has an ID because I have a Country_ID table that I created and updated the names with the ID:
Country_ID
Country_Name
1986
1
1986
2
So far so good. Now I want to create a new table where I'll have all countries in a specific year and the total medals for that country. I managed to easily do that for countries that participated in an edition, here's an example for the 1896 edition:
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1896 AND Year < 1900
Group By a.Country_Name, a.Year
And I'll have this table:
Country_ID
Year
Cumultative_Gold
Cumultative_Silver
Cumultative_Bronze
Total_Medals
6
1986
2
0
0
5
7
1986
2
1
2
5
35
1986
1
2
3
6
46
1986
5
4
2
11
49
1986
6
5
2
13
51
1986
2
3
2
7
52
1986
10
18
19
47
58
1986
2
1
3
6
85
1986
1
0
1
2
131
1986
1
2
0
3
146
1986
11
7
2
20
To add the other editions I just have to edit the dates, "Where a.Year >= 1900 AND Year < 1904", for example.
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1900 AND Year < 1904
Group By a.Country_Name, a.Year
And the table will grow.
But I'd like to also add all the other countries for the year 1896. This way I'll have a full record of all countries. So for example, you see that Country 1 has no medals in the 1896 Olympic edition, but I'd like to also add it there, even if the sum becomes NULL (where I'll update with a 0).
Why do I want that? I'd like to do an Animated Bar Chart Race, and with the data I have, some counties go "away" from the race. For example, the US didn't participate in the 1980 Olympics, so for a brief moment, the Bar for the US in the chart goes away just to return in 1984 (when it participated again). Another example is the Soviet Union, even though they do not participate anymore, it's the second participant with most medals won (only behind the US), but as the country does not have more participation after 1988, the bar just goes away after that year. By keeping a record of medals for all countries in all editions would prevent that from happening.
I'm pretty sure there are lots of countries that have won metals that were not around in 1896. But if you want a row for every country and every year, then generate the rows you want using cross join. Then join in the available information:
select c.Country_Name, y.Year,
SUM(cm.Gold) As Cumulative_Gold,
SUM(cm.Silver) As Cumulative_Silver,
SUM(cm.Bronze) As Cumulative_Bronze,
COALESCE(SUM(cm.Gold), 0) + COALESCE(SUM(cm.Silver), 0) + COALESCE(SUM(cm.Bronze), 0) AS Total_Medals
from (select distinct year from Country_Medals) y cross join
(select distinct country_name from country_medals) c left join
country_medals cm
on cm.year = y.year and
cm.country_name = c.country_name
group By c.Country_Name, y.Year

Replace Id of one column by a name from another table while using the count statement?

I am trying to get the count of patients by province for my school project, I have managed to get the count and the Id of the province in a table but since I am using the count statement it will not let me use join to show the ProvinceName instead of the Id (it says it's not numerical).
Here is the schema of the two tables I am talking about
The content of the Province table is as follow:
ProvinceId
ProvinceName
ProvinceShortName
1
Terre-Neuve-et-Labrador
NL
2
Île-du-Prince-Édouard
PE
3
Nouvelle-Écosse
NS
4
Nouveau-Brunswick
NB
5
Québec
QC
6
Ontario
ON
7
Manitoba
MB
8
Saskatchewan
SK
9
Alberta
AB
10
Colombie-Britannique
BC
11
Yukon
YT
12
Territoires du Nord-Ouest
NT
13
Nunavut
NU
And here is n sample data from the Patient table (don't worry it's fake data!):
SS
FirstName
LastName
InsuranceNumber
InsuranceProvince
DateOfBirth
Sex
PhoneNumber
2
Doris
Patel
PATD778276
5
1977-08-02
F
514-754-6488
3
Judith
Doe
DOEJ7712917
5
1977-12-09
F
418-267-2263
4
Rosemary
Barrett
BARR05122566
6
2005-12-25
F
905-638-5062
5
Cody
Kennedy
KENC047167
10
2004-07-01
M
604-833-7712
I managed to get the patient count by province using the following statement:
select count(SS),InsuranceProvince
from Patient
full JOIN Province ON Patient.InsuranceProvince = Province.ProvinceId
group by InsuranceProvince
which gives me the following table:
PatientCount
InsuranceProvince
13
1
33
2
54
3
4
4
608
5
1778
6
25
7
209
8
547
9
649
10
6
11
35
12
24
13
How can I replace the id's with the correct ProvinceShortName to get the following final result?
ProvinceName
PatientCount
NL
13
PE
33
NS
54
NB
4
QC
608
ON
1778
MB
25
SK
209
AB
547
BC
649
YT
6
NT
35
NU
24
Thanks in advance!
So you can actually just specify that in the select. Note that it's best practise to include the thing you group by in the select, but since your question is so specific then...
SELECT ProvinceShortName, COUNT(SS) AS PatientsInProvince
FROM Patient
JOIN Province ON Patient.InsuranceProvince=Province.ProvinceId
GROUP BY InsuranceProvince;
I would suggest:
select pr.ProvinceShortName, count(*)
from Patient p join
Province pr
on p.InsuranceProvince = pr.ProvinceId
group by pr.ProvinceShortName
order by min(pr.ProvinceId);
Notes:
The key is including the columns you want in the select and group by.
You seem to want the results in province number order, so I included an order by.
There is no need to count the non-NULL values of SS. You might as well use count(*).
Table aliases make the query easier to write and to read.
I assume that you need to show the patient count by province.
SELECT
Province.ProvinceShortName AS [ProvinceName]
,COUNT(1) as [PatinetCount]
FROM Patient
RIGHT JOIN Province ON Patient.InsuranceProvince = Province.ProvinceId
GROUP BY ProvinceShortName
Just altering your query to
select ProvinceShortName As PatientCount,count(InsuranceProvince) As PatientCount
from Patient
full JOIN Province ON Patient.InsuranceProvince = Province.ProvinceId
group by ProvinceShortName

List the Id who appeared once only in Relational Algebra

Let's say there's a table called Winner, with 3 attributes: Name, Gender and Id.
Name Gender Id
Kevin Male 8
Kevin Male 8
Benny Male 31
Jenny Female 7
Louie Male 4
Peter Male 11
Kevin Male 2
Jenny Female 7
Jenny Female 7
Chris Male 23
Louie Female 14
Apart from those people who is actually 2 different person but with the same name and those people who have the same name but with different gender, their Id's will be the unique value to identify themselves. If I want to list all the Id's who appeared once only in the list, I am thinking to do something like this:
Am I expressing it correctly ?
I don't know what your formula is trying to say, but in SQL you can achieve the result you want with a GROUP BY query:
SELECT Id, COUNT(Id) AS idCount
FROM Winner
GROUP BY Id
HAVING COUNT(Id) = 1

SQL Select Distinct returning duplicates

I am trying to return the country, golfer name, golfer age, and average drive for the golfers with the highest average drive from each country.
However I am getting a result set with duplicates of the same country. What am I doing wrong? here is my code:
select distinct country, name, age, avgdrive
from pga.golfers S1
inner join
(select max(avgdrive) as MaxDrive
from pga.golfers
group by country) S2
on S1.avgdrive = s2.MaxDrive
order by avgdrive;
These are some of the results I've been getting back, I should only be getting 15 rows, but instead I'm getting 20:
COUN NAME AGE AVGDRIVE
---- ------------------------------ ---------- ----------
Can Mike Weir 35 279.9
T&T Stephen Ames 41 285.8
USA Tim Petrovic 39 285.8
Ger Bernhard Langer 47 289.3
Swe Fredrik Jacobson 30 290
Jpn Ryuji Imada 28 290
Kor K.J. Choi 37 290.4
Eng Greg Owen 33 291.8
Ire Padraig Harrington 33 291.8
USA Scott McCarron 40 291.8
Eng Justin Rose 25 293.1
Ind Arjun Atwal 32 293.7
USA John Rollins 30 293.7
NIr Darren Clarke 37 294
Swe Daniel Chopra 31 297.2
Aus Adam Scott 25 300.6
Fij Vijay Singh 42 300.7
Spn Sergio Garcia 25 301.9
SAf Ernie Els 35 302.9
USA Tiger Woods 29 315.2
You are missing a join condition:
select s1.country, s1.name, s1.age, s1.avgdrive
from pga.golfers S1 inner join
(select country, max(avgdrive) as MaxDrive
from pga.golfers
group by country
) S2
on S1.avgdrive = s2.MaxDrive and s1.country = s2.country
order by s1.avgdrive;
Your problem is that some people in one country have the same average as the best in another country.
DISTINCT eliminated duplicate rows, not values in some fields.
To get a list of countries with ages, names, and max drives, you would need to group the whole select by country.

SQL Server: Merge Data Rows in single table in output

I have a SQL Server table with the following fields and sample data:
ID Name Address Age
23052-PF Peter Timbuktu 25
23052-D1 Jane Paris 22
23052-D2 David London 24
23050-PF Sam Beijing 22
23051-PF Nancy NYC 26
23051-D1 Carson Cali 22
23056-PF Grace LA 28
23056-D1 Smith Boston 23
23056-D2 Mark Adelaide 26
23056-D3 Hose Mexico 25
23056-D4 Mandy Victoria 24
Each ID with -PF is unique in the table.
Each ID with the -Dx is related to the same ID with the -PF.
Each ID with -PF may have 0 or more IDs with -Dx.
The maximum number of -Dx rows for a given -PF is 9.
i.e. an ID 11111-PF can have 11111-D1, 11111-D2, 11111-D3 up to 11111-D9.
Output expected for above sample data:
ID ID (without suffix) PF_Name PF_Address PF_Age D_Name D_Address D_Age
23052-PF 23052 Peter Timbuktu 25 Jane Paris 22
23052-PF 23052 Peter Timbuktu 25 David London 24
23050-PF 23050 Sam Beijing 22 NULL NULL NULL
23051-PF 23051 Nancy NYC 26 Carson Cali 22
23056-PF 23056 Grace LA 28 Smith Boston 23
23056-PF 23056 Grace LA 28 Mark Adelaide 26
23056-PF 23056 Grace LA 28 Hose Mexico 25
23056-PF 23056 Grace LA 28 Mandy Victoria 24
I need to be able to join the -PF and -Dx as above.
If a -PF has 0 Dx rows, then D_Name, D_Address and D_Age columns in the output should return NULL.
If a -PF has one or more Dx rows, then PF_Name, PF_Address and PF_Age should repeat for each row in the output and D_Name, D_Address and D_Age should contain the values from each related Dx row.
Need to use MSSQL.
Query should not use views or create additional tables.
Thanks for all your help!
select
pf.ID,
pf.IDNum,
pf.Name as PF_Name,
pf.Address as PF_Address,
pf.Age as PF_Age,
dx.Name as D_Name,
dx.Address as D_Address,
dx.Age as D_Age
from
(
select
ID, left(ID, 5) as IDNum, Name, Address, Age
from
mytable
where
right(ID, 3) = '-PF'
) pf
left outer join
(
select
ID, left(ID, 5) as IDNum, Name, Address, Age
from
mytable
where
right(ID, 3) != '-PF'
) dx
on pf.IDNum = dx.IDNum
SqlFiddle demo: http://sqlfiddle.com/#!6/dfdbb/1
SELECT t1.ID, LEFT(t1.ID,5) "ID (without Suffix)",
t1.Name "PF_Name", t1.Address "PF_Address", t1.Age "PF_Age",
t2.Name "D_Name", t2.Address "D_Address", t2.Age "D_Age"
FROM PFTable t1
LEFT JOIN PFTable t2 on LEFT(t1.ID,5) = LEFT(t2.ID,5)
WHERE RIGHT(t1.ID,2) = 'PF'