SQL Select Distinct returning duplicates - sql

I am trying to return the country, golfer name, golfer age, and average drive for the golfers with the highest average drive from each country.
However I am getting a result set with duplicates of the same country. What am I doing wrong? here is my code:
select distinct country, name, age, avgdrive
from pga.golfers S1
inner join
(select max(avgdrive) as MaxDrive
from pga.golfers
group by country) S2
on S1.avgdrive = s2.MaxDrive
order by avgdrive;
These are some of the results I've been getting back, I should only be getting 15 rows, but instead I'm getting 20:
COUN NAME AGE AVGDRIVE
---- ------------------------------ ---------- ----------
Can Mike Weir 35 279.9
T&T Stephen Ames 41 285.8
USA Tim Petrovic 39 285.8
Ger Bernhard Langer 47 289.3
Swe Fredrik Jacobson 30 290
Jpn Ryuji Imada 28 290
Kor K.J. Choi 37 290.4
Eng Greg Owen 33 291.8
Ire Padraig Harrington 33 291.8
USA Scott McCarron 40 291.8
Eng Justin Rose 25 293.1
Ind Arjun Atwal 32 293.7
USA John Rollins 30 293.7
NIr Darren Clarke 37 294
Swe Daniel Chopra 31 297.2
Aus Adam Scott 25 300.6
Fij Vijay Singh 42 300.7
Spn Sergio Garcia 25 301.9
SAf Ernie Els 35 302.9
USA Tiger Woods 29 315.2

You are missing a join condition:
select s1.country, s1.name, s1.age, s1.avgdrive
from pga.golfers S1 inner join
(select country, max(avgdrive) as MaxDrive
from pga.golfers
group by country
) S2
on S1.avgdrive = s2.MaxDrive and s1.country = s2.country
order by s1.avgdrive;
Your problem is that some people in one country have the same average as the best in another country.

DISTINCT eliminated duplicate rows, not values in some fields.
To get a list of countries with ages, names, and max drives, you would need to group the whole select by country.

Related

Oracle 11g - select only one row for each town in a table

I have a table with 300K records, but only ~100 unique town names. I need sql to return 1 row for each individual town name. Table structure:
UNIQUE_ID
STREET_NUMBER
STREET_NAME
STREET_TYPE
TOWN
ZIP
UID01
11
TROY
STREET
ASHFIELD
2017
UID02
13
ABED
ROAD
ASHFIELD
2017
UID03
2
FRANK
COURT
EMERTON
2021
UID04
8
DENNIS
GROVE
SACKVILLE
2028
UID05
97
MAC
CRESCENT
SACKVILLE
2028
UID06
102
CHARLIE
WALK
SACKVILLE
2028
UID07
70
DEE
BOULEVARD
WINDSOR
2033
UID08
27
POPPY
STREET
WINDSOR
2033
UID09
33
ALLY
WAY
BARGO
2315
UID10
48
ELS
AVENUE
BARGO
2315
I'm trying to get the data returned to be something like:
UNIQUE_ID
STREET_NUMBER
STREET_NAME
STREET_TYPE
TOWN
ZIP
UID01
11
TROY
STREET
ASHFIELD
2017
UID03
2
FRANK
COURT
EMERTON
2021
UID04
8
DENNIS
GROVE
SACKVILLE
2028
UID07
70
DEE
BOULEVARD
WINDSOR
2033
UID09
33
ALLY
WAY
BARGO
2315
Don't care which record is returned for each town name, but need one record for each town.
I've trawled through various similar posts but can't seem to get the syntax correct.
I'm able to select each individual town name using this:
select min(TOWN) keep (dense_rank first order by rownum) TOWN
from ADDRESS_TABLE group by TOWN;
But not sure how to get the other attached data to return as well.
Help please?
If you don't care about which one to take, then take any of them (e.g. first by unique_id):
WITH
temp
AS
(SELECT unique_id,
street_number,
street_name,
street_type,
town,
zip,
ROW_NUMBER () OVER (PARTITION BY town ORDER BY unique_id) rn
FROM address_table)
SELECT unique_id,
street_number,
street_name,
street_type,
town,
zip
FROM temp
WHERE rn = 1

Replace Id of one column by a name from another table while using the count statement?

I am trying to get the count of patients by province for my school project, I have managed to get the count and the Id of the province in a table but since I am using the count statement it will not let me use join to show the ProvinceName instead of the Id (it says it's not numerical).
Here is the schema of the two tables I am talking about
The content of the Province table is as follow:
ProvinceId
ProvinceName
ProvinceShortName
1
Terre-Neuve-et-Labrador
NL
2
Île-du-Prince-Édouard
PE
3
Nouvelle-Écosse
NS
4
Nouveau-Brunswick
NB
5
Québec
QC
6
Ontario
ON
7
Manitoba
MB
8
Saskatchewan
SK
9
Alberta
AB
10
Colombie-Britannique
BC
11
Yukon
YT
12
Territoires du Nord-Ouest
NT
13
Nunavut
NU
And here is n sample data from the Patient table (don't worry it's fake data!):
SS
FirstName
LastName
InsuranceNumber
InsuranceProvince
DateOfBirth
Sex
PhoneNumber
2
Doris
Patel
PATD778276
5
1977-08-02
F
514-754-6488
3
Judith
Doe
DOEJ7712917
5
1977-12-09
F
418-267-2263
4
Rosemary
Barrett
BARR05122566
6
2005-12-25
F
905-638-5062
5
Cody
Kennedy
KENC047167
10
2004-07-01
M
604-833-7712
I managed to get the patient count by province using the following statement:
select count(SS),InsuranceProvince
from Patient
full JOIN Province ON Patient.InsuranceProvince = Province.ProvinceId
group by InsuranceProvince
which gives me the following table:
PatientCount
InsuranceProvince
13
1
33
2
54
3
4
4
608
5
1778
6
25
7
209
8
547
9
649
10
6
11
35
12
24
13
How can I replace the id's with the correct ProvinceShortName to get the following final result?
ProvinceName
PatientCount
NL
13
PE
33
NS
54
NB
4
QC
608
ON
1778
MB
25
SK
209
AB
547
BC
649
YT
6
NT
35
NU
24
Thanks in advance!
So you can actually just specify that in the select. Note that it's best practise to include the thing you group by in the select, but since your question is so specific then...
SELECT ProvinceShortName, COUNT(SS) AS PatientsInProvince
FROM Patient
JOIN Province ON Patient.InsuranceProvince=Province.ProvinceId
GROUP BY InsuranceProvince;
I would suggest:
select pr.ProvinceShortName, count(*)
from Patient p join
Province pr
on p.InsuranceProvince = pr.ProvinceId
group by pr.ProvinceShortName
order by min(pr.ProvinceId);
Notes:
The key is including the columns you want in the select and group by.
You seem to want the results in province number order, so I included an order by.
There is no need to count the non-NULL values of SS. You might as well use count(*).
Table aliases make the query easier to write and to read.
I assume that you need to show the patient count by province.
SELECT
Province.ProvinceShortName AS [ProvinceName]
,COUNT(1) as [PatinetCount]
FROM Patient
RIGHT JOIN Province ON Patient.InsuranceProvince = Province.ProvinceId
GROUP BY ProvinceShortName
Just altering your query to
select ProvinceShortName As PatientCount,count(InsuranceProvince) As PatientCount
from Patient
full JOIN Province ON Patient.InsuranceProvince = Province.ProvinceId
group by ProvinceShortName

Join multiple tables and pick results from most recent table

I have 4 tables. I want all the rows and cols from my first table tbl_2021 and only those data which are not in tbl_2021 but present in the the rest 3 tables, but based on one condition
if there id exist in tbl_2020, tbl_2019 and in tbl_2018 then i need the id and it's details from the most recent table that is tbl_2020.
if an id is across 2019 and 2018 table, then i need the data from 2019 so on like that.If in 2020 and 018 then 020 and so on
if the same is across 2021,2020,2019 and 2018 then the data from 2021 is selected.
And - I'm hail from a shell scripting background, and i've just started with sql. so if any noble mind could tell me the approach or what i should do to get these pieces together would mean more than happiness to me. Thank you
tbl_2021
id
name
addr
location
country
contintent
gdp
123
rob
dware
texas
us
us
8
456
lilly
gwood
london
uk
uk
5
670
rick
utown
newyrok
us
us
8
490
zang
kcity
hk
hongkong
hongkong
6
tbl_020
id
location
name
999
ger
roger
888
bel
leslie
670
us
marie
tbl_019
id
location
name
data
network
999
uk
roger
xx
na
555
rus
vladmir
ux
na
879
us
marie
xx
ua
481
cn
kim
tbl_018
id
location
name
data
network
823
uk
roger
xx
na
555
rus
vladmir
ux
na
879
us
maria
xx
ua
670
us
marie
xy
uy
888
in
raj
xx
jo
output:
id
name
addr
location
country
contintent
gdp
123
rob
dware
texas
us
us
8
456
lilly
gwood
london
uk
uk
5
670
rick
utown
newyrok
us
us
8
490
zang
kcity
hk
hongkong
hongkong
6
999
roger
ger
888
leslie
bel
555
vladmir
rus
879
marie
us
481
kim
cn
823
roger
uk
First, you should fix your data model. It is not a good idea to store such data in separate tables. Instead, you should store in a single table with a year column.
Second, I think you can solve your problem using full join, but it is a little tricky:
select coalesce(t21.id, t20.id, t19.id, t18.id) as id,
coalesce(t21.name, t20.name, t19.name, t18.name) as name,
t21.addr,
. . .
from tbl_2021 t21 full join
tbl_2020 t20
on t21.id = t20.id full join
tbl_2019 t19
on t19.id = coalesce(t21.id, t20.id) full join
tbl_2018 t18
on t18.id = coalesce(t21.id, t20.id, t19.id);
You need to carefully figure out how the columns should be pulled from the different tables.
First you can union all the data from four tables with union all. Then with row_number() we need to serialized rows for each id from higher to lower. Finally select one row for each id with highest year .
with cte as
(
select id,name addr ,location ,country, contintent,data,network, row_number()over (partition by id order by sl ) rn from
(
select id,name ,addr ,location , country, contintent,data,network, 1 sl from tbl_21
union all
select id,name ,'' addr ,location ,'' country,'' contintent, data, network, 2 sl from tbl_20
union all
select id,name ,'' addr ,location ,'' country,'' contintent, data,network, 3 sl from tbl_19
union all
select id,name ,'' addr ,location ,'' country,'' contintent, data,network, 4 sl from tbl_18
)t
)
select id,name ,addr ,location ,country, contintent,data,network from cte where rn=1

sqlite3: COUNT & EXCEPT not working as expected

I'm fairly new to SQL but having searched the internet for an answer to this I still cannot get my COUNT and EXCEPT statements to select what I want.
My Database:
sqlite> CREATE TABLE Football(Team TEXT, Player TEXT, Age INTEGER, primary key(Team, Player));
sqlite> .separator ,
sqlite> .import databaseTest Football
sqlite> .headers on
sqlite> .mode col
sqlite> SELECT Team, Player, Age FROM Football ORDER BY Team;
Team Player Age
---------- ---------- ----------
Arsenal Cech 38
Arsenal Giroud 29
Arsenal Sanchez 28
Arsenal Walcott 27
Chelsea Costa 29
Chelsea Courtois 25
Chelsea Hazard 26
Chelsea Willian 26
Liverpool Can 23
Liverpool Coutinho 24
Liverpool Wjinaldum 25
Liverpool Woodburn 17
Manchester Aguero 29
Manchester Jesus 19
Manchester Silva 28
Manchester Toure 34
Manchester De Gea 26
Manchester Felliani 29
Manchester Rooney 32
Manchester Schweinste 35
Tottenham Delle Ali 22
Tottenham Kane 24
Tottenham Rose 24
Tottenham Vertonghen 27
What I want to do is SELECT the COUNT of teams that do not have a player over the age of 30. So the select statement should be 3 (Chelsea, Liverpool, Tottenham).
This is the statement I've tried and assumed would work:
sqlite> SELECT COUNT(DISTINCT Team) FROM Football
...> EXCEPT
...> SELECT COUNT(DISTINCT Team) FROM Football WHERE Age > 30;
COUNT(DISTINCT Team)
--------------------
6
But as you can see it returns '6'. What am I doing wrong and how can I get the correct result?
Here is another way. Look at the maximum age for each team:
SELECT COUNT(*)
FROM (SELECT Team
FROM Football
GROUP BY Team
HAVING MAX(Age) <= 30
) t;
You can also use EXCEPT, but this also requires a subquery. You need to do the set operation before doing the count:
SELECT COUNT(DISTINCT TEAM)
FROM (SELECT Team FROM Football
EXCEPT
SELECT Team FROM Football WHERE Age > 30
) t;
Strictly speaking, this query could use COUNT(*) rather than COUNT(DISTINCT). However, it can be troublesome to remember that EXCEPT (like UNION) removes duplicate values.

SQL Server: Merge Data Rows in single table in output

I have a SQL Server table with the following fields and sample data:
ID Name Address Age
23052-PF Peter Timbuktu 25
23052-D1 Jane Paris 22
23052-D2 David London 24
23050-PF Sam Beijing 22
23051-PF Nancy NYC 26
23051-D1 Carson Cali 22
23056-PF Grace LA 28
23056-D1 Smith Boston 23
23056-D2 Mark Adelaide 26
23056-D3 Hose Mexico 25
23056-D4 Mandy Victoria 24
Each ID with -PF is unique in the table.
Each ID with the -Dx is related to the same ID with the -PF.
Each ID with -PF may have 0 or more IDs with -Dx.
The maximum number of -Dx rows for a given -PF is 9.
i.e. an ID 11111-PF can have 11111-D1, 11111-D2, 11111-D3 up to 11111-D9.
Output expected for above sample data:
ID ID (without suffix) PF_Name PF_Address PF_Age D_Name D_Address D_Age
23052-PF 23052 Peter Timbuktu 25 Jane Paris 22
23052-PF 23052 Peter Timbuktu 25 David London 24
23050-PF 23050 Sam Beijing 22 NULL NULL NULL
23051-PF 23051 Nancy NYC 26 Carson Cali 22
23056-PF 23056 Grace LA 28 Smith Boston 23
23056-PF 23056 Grace LA 28 Mark Adelaide 26
23056-PF 23056 Grace LA 28 Hose Mexico 25
23056-PF 23056 Grace LA 28 Mandy Victoria 24
I need to be able to join the -PF and -Dx as above.
If a -PF has 0 Dx rows, then D_Name, D_Address and D_Age columns in the output should return NULL.
If a -PF has one or more Dx rows, then PF_Name, PF_Address and PF_Age should repeat for each row in the output and D_Name, D_Address and D_Age should contain the values from each related Dx row.
Need to use MSSQL.
Query should not use views or create additional tables.
Thanks for all your help!
select
pf.ID,
pf.IDNum,
pf.Name as PF_Name,
pf.Address as PF_Address,
pf.Age as PF_Age,
dx.Name as D_Name,
dx.Address as D_Address,
dx.Age as D_Age
from
(
select
ID, left(ID, 5) as IDNum, Name, Address, Age
from
mytable
where
right(ID, 3) = '-PF'
) pf
left outer join
(
select
ID, left(ID, 5) as IDNum, Name, Address, Age
from
mytable
where
right(ID, 3) != '-PF'
) dx
on pf.IDNum = dx.IDNum
SqlFiddle demo: http://sqlfiddle.com/#!6/dfdbb/1
SELECT t1.ID, LEFT(t1.ID,5) "ID (without Suffix)",
t1.Name "PF_Name", t1.Address "PF_Address", t1.Age "PF_Age",
t2.Name "D_Name", t2.Address "D_Address", t2.Age "D_Age"
FROM PFTable t1
LEFT JOIN PFTable t2 on LEFT(t1.ID,5) = LEFT(t2.ID,5)
WHERE RIGHT(t1.ID,2) = 'PF'