Help me understand this particular use of nested SELECT statements - sql

From this site:
Tables:
CREATE TABLE PilotSkills
(pilot_name CHAR(15) NOT NULL,
plane_name CHAR(15) NOT NULL,
PRIMARY KEY (pilot_name, plane_name));
CREATE TABLE Hangar
(plane_name CHAR(15) NOT NULL PRIMARY KEY);
Query:
SELECT DISTINCT pilot_name
FROM PilotSkills AS PS1
WHERE NOT EXISTS
(SELECT *
FROM Hangar
WHERE NOT EXISTS
(SELECT *
FROM PilotSkills AS PS2
WHERE (PS1.pilot_name = PS2.pilot_name)
AND (PS2.plane_name = Hangar.plane_name)));
I understand the problem it's used for (set division), including the analogy that describes it as "There ain't no planes in this hangar that I can't fly!". What I don't understand is exactly what's at work here, and how it comes together to do what it says its doing.
Having trouble stating with specifics my difficulty at the moment...
Edit: Let me just first ask what something like this does, exactly:
SELECT DISTINCT pilot_name
FROM PilotSkills
WHERE NOT EXISTS
(SELECT *
FROM Hangar)
I think I'm missing some fundamental understanding here...
Edit: Irrelevant, and it wouldn't be a meaningful without the third nested SELECT, right?

What we want is a distinct list of pilots that can fly every plane in the hanger. For that to be true, for a given pilot, there cannot exist a plane they cannot fly. So, we want to get a list of all planes for each pilot and see if there is one they cannot fly. If there is one (the pilot cannot fly) we remove them from the list. Whomever is left, can fly all planes in the hanger.
Said more formally, find a distinct list of pilot names such that for a given pilot, there does not exist a plane in the set of planes (Hanger) such that the plane does not exist in the set of the given pilot's skills.
"find a distinct list of pilot
names..."
Select Distinct pilot_name
From PilotSkills As PS1
...
"...such that for a given pilot, there
does not exist a plane in the set of
planes (Hanger)..."
Select Distinct pilot_name
From PilotSkills As PS1
Where Not Exists (
Select 1
From Hanger
"...such that the plane does not exist
in the set of the given pilot's
skills."
Select Distinct pilot_name
From PilotSkills As PS1
Where Not Exists (
Select 1
From Hanger As H
Where Not Exists (
Select 1
From PilotSkills As PS2
Where PS2.pilot_name = PS1.pilot_name
And PS2.plane_name = H.plane_name
)
)

As a minor comment initially, Select * is overkill in this situation. You should select a single column, or a couple of columns, but pulling all columns should be avoided, especially in sub queries where they're only used during the query and not returned in the final result set. That said, to try to break down the work flow:
Select Pilot_Name From PilotSkills - We're interested in pilot names eventually.
Where Not Exists (Select * From Hangar) - We're only going to retrieve pilots if there is not a relevant entry for them in the Hangar table.
Where Not Exists (Select * From PilotSkills) - We're only going to retrieve Hangars that don't have a pilot from the outer query.
Describing it as a double negative (from the other answer) is a great way to understand it. It can probably be achieved more directly.

Conceptually it is just a double negative.
Select all the pilots for which there
does not exist a plane in the hangar
that they cannot fly.
But it seems you are asking about the mechanics of the query itself? It uses two levels of correlated sub query.
If we reduce the number of rows down to a minimal amount and add an additional table to simplify the explanation slightly (the outer instance of PilotSkills in the query is just used to get the list of Pilots). Then the query would look like
SELECT pilot_name
FROM Pilots
WHERE NOT EXISTS
(SELECT *
FROM Hangar
WHERE NOT EXISTS
(SELECT *
FROM PilotSkills
WHERE (Pilots.pilot_name = PilotSkills.pilot_name)
AND (PilotSkills.plane_name = Hangar.plane_name)));
Pilots
pilot_name
===========
'Celko'
'Higgins'
Hangar
plane_name
=============
'B-1 Bomber'
'F-14 Fighter'
PilotSkills
pilot_name plane_name
=========================
'Celko' 'F-14 Fighter'
'Higgins' 'B-1 Bomber'
'Higgins' 'F-14 Fighter'
If you want to know which pilots can fly all the planes in the hangar then
For each Pilots.pilot_name in turn
Look at each Hangar.plane_name in turn
And check if there is a corresponding row in PilotSkills for that pilot_name,plane_name
If step 3 is false then we know that there is at least one plane in the hangar the pilot cannot fly and we can stop processing that Pilots row and go onto the next one. If step 3 is true then we must then return to step 2 and check the next plane in the Hangar. If we finish processing all planes in the hangar and for each one there has been a corresponding row in PilotSkills then we know that this pilot can fly all planes.
Or to put it another way we know there does not exist a plane (as we have checked them all) for which there does not exist a matching row in the PilotSkills table.

Related

Solving a SQL query involving set division by another method

The problem statement for the SQL query is to find all the pilots who can fly the planes listed in the Plane table. So basically we have a Pilot table which has 2 columns namely Pilot Names and the Planes they can fly and the second table called the Plane table which has only 1 column namely the Planes column. So, we have to find all the pilots that can fly all the planes listed in the Plane column of the Plane table. I know one of the ways is to proceed via relational division but is there any other way to solve this?
The Table Schema loos like this:
Pilot(Pilot_Name, Planes)
Plane(Planes)
That shows the Pilot table consists of 2 columns and the Plane table consist of a single column.
The below code is the solution to the problem using set divison:
SELECT DISTINCT pilot_name
FROM PilotSkills AS PS1
WHERE NOT EXISTS
(SELECT *
FROM Hangar
WHERE NOT EXISTS
(SELECT *
FROM PilotSkills AS PS2
WHERE (PS1.pilot_name = PS2.pilot_name)
AND (PS2.plane_name = Hangar.plane_name)));
This query gives the desired result. But I was wondering if there is another method to solve this question without using the above mentioned concept of set division.
I would do this using aggregation:
select pilot_name
from pilotskills
group by pilot_name
having count(*) = (select count(*) from planes);
If a plan could be listed twice in for a given pilot in the skills table, the having clause should be:
select pilot_name
from pilotskills
group by pilot_name
having count(distinct plane_name) = (select count(*) from planes);

Subquery that matches column with several ranges defined in table

I've got a pretty common setup for an address database: a person is tied to a company with a join table, the company can have an address and so forth.
All pretty normalized and easy to use. But for search performance, I'm creating a materialized, rather denormalized view. I only need a very limited set of information and quick queries. Most of everything that's usually done via a join table is now in an array. Depending on the query, I can either search it directly or join it via unnest.
As a complement to my zipcodes column (varchar[]), I'd like to add a states column that has the (German fedaral) states already precomputed, so that I don't have to transform a query to include all kinds of range comparisons.
My mapping date is in a table like this:
CREATE TABLE zip2state (
state TEXT NOT NULL,
range_start CHARACTER VARYING(5) NOT NULL,
range_end CHARACTER VARYING(5) NOT NULL
)
Each state has several ranges, and ranges can overlap (one zip code can be for two different states). Some ranges have range_start = range_end.
Now I'm a bit at wit's end on how to get that into a materialized view all at once. Normally, I'd feel tempted to just do it iteratively (via trigger or on the application level).
Or as we're just talking about 5 digits, I could create a big table mapping zip to state directly instead of doing it via a range (my current favorite, yet something ugly enough that it prompted me to ask whether there's a better way)
Any way to do that in SQL, with a table like the above (or something similar)? I'm at postgres 9.3, all features allowed...
For completeness' sake, here's the subquery for the zip codes:
(select array_agg(distinct address.zipcode)
from affiliation
join company
on affiliation.ins_id = company.id
join address
on address.com_id = company.id
where affiliation.per_id = person.id) AS zipcodes,
I suggest a LATERAL join instead of the correlated subquery to conveniently compute both columns at once. Could look like this:
SELECT p.*, z.*
FROM person p
LEFT JOIN LATERAL (
SELECT array_agg(DISTINCT d.zipcode) AS zipcodes
, array_agg(DISTINCT z.state) AS states
FROM affiliation a
-- JOIN company c ON a.ins_id = c.id -- suspect you don't need this
JOIN address d ON d.com_id = a.ins_id -- c.id
LEFT JOIN zip2state z ON d.zipcode BETWEEN z.range_start AND z.range_end
WHERE a.per_id = p.id
) z ON true;
If referential integrity is guaranteed, you don't need to join to the table company at all. I took the shortcut.
Be aware that varchar or text behaves differently than expected for numbers. For example: '333' > '0999'. If all zip codes have 5 digits you are fine.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

How to use the result from a second select in my first select

I am trying to use a second SELECT to get some ID, then use that ID in a second SELECT and I have no idea how.
SELECT Employee.Name
FROM Emplyee, Employment
WHERE x = Employment.DistributionID
(SELECT Distribution.DistributionID FROM Distribution
WHERE Distribution.Location = 'California') AS x
This post got long, but here is a short "tip"
While the syntax of my select is bad, the logic is not. I need that "x" somehow. Thus the second select is the most important. Then I have to use that "x" within the first select. I just don't know how
/Tip
This is the only thing I could imagine, I'm very new at Sql, I think I need a book before practicing, but now that I've started I'd like to finish my small program.
EDIT:
Ok I looked up joins, still don't get it
SELECT Employee.Name
FROM Emplyee, Employment
WHERE x = Employment.DistributionID
LEFT JOIN Distribution ON
(SELECT Distribution.DistributionID FROM Distribution
WHERE Distribution.Location = 'California') AS x
Get error msg at AS and Left
I use name to find ID from upper red, I use the ID I find FROM upper red in lower table. Then I match the ID I find with Green. I use Green ID to find corresponding Name
I have California as output data from C#. I want to use California to find the DistributionID. I use the DistributionID to find the EmployeeID. I use EmployeeID to find Name
My logic:
Parameter: Distribution.Name (from C#)
Find DistributionID that has Distribution.Name
Look in Employment WHERE given DistributionID
reveals Employees that I am looking for (BY ID)
Use that ID to find Name
return Name
Tables:
NOTE: In this example picture the Employee repeats because of the select, they are in fact singular
In "Locatie" (middle table) is Location, I get location (again) from C#, I use California as an example. I need to find the ID first and foremost!
Sory they are not in english, but here are the create tables:
Try this:
SELECT angajati.Nume
FROM angajati
JOIN angajari ON angajati.AngajatID = angajari.AngajatID
JOIN distribuire ON angajari.distribuireid = distribuire.distribuireid
WHERE distribuire.locatie = 'california'
As you have a table mapping employees to their distribution locations, you just need to join that one in the middle to create the mapping. You can use variables if you like for the WHERE clause so that you can call this as a stored procedure or whatever you need from the output of your C# code.
Try this solution:
DECLARE #pLocatie VARCHAR(40)='Alba'; -- p=parameter
SELECT a.AngajatID, a.Nume
FROM Angajati a
JOIN Angajari j ON a.AngajatID=j.AngajatID
JOIN Distribuire d ON j.DistribuireID=d.DistribuireID
WHERE d.Locatie=#pLocatie
You should add an unique key on Angajari table (Employment) thus:
ALTER TABLE Angajari
ADD CONSTRAINT IUN_Angajari_AngajatID_DistribuireID UNIQUE (AngajatUD, DistribuireID);
This will prevent duplicated (AngajatID, DistribuireID).
I don't know how you are connecting Emplyee(sic?) and Employment, but you want to use a join to connect two tables and in the join specify how the tables are related. Joins usually look best when they have aliases so you don't have to repeat the entire table name. The following query will get you all the information from both Employment and Distribution tables where the distribution location is equal to california. You can join employee to employment to get name as well.
SELECT *
FROM Employment e
JOIN Distribution d on d.DistributionID = e.DistributionID
WHERE d.Location = 'California'
This will return the contents of both tables. To select particular records use the alias.[Col_Name] separated by a comma in the select statement, like d.DistributionID to return the DistributionID from the Distribution Table

Modelling database for a small soccer league

The database is quite simple. Below there is a part of a schema relevant to this question
ROUND (round_id, round_number)
TEAM (team_id, team_name)
MATCH (match_id, match_date, round_id)
OUTCOME (team_id, match_id, score)
I have a problem with query to retrieve data for all matches played. The simple query below gives of course two rows for every match played.
select *
from round r
inner join match m on m.round_id = r.round_id
inner join outcome o on o.match_id = m.match_id
inner join team t on t.team_id = o.team_id
How should I write a query to have the match data in one row?
Or maybe should I redesign the database - drop the OUTCOME table and modify the MATCH table to look like this:
MATCH (match_id, match_date, team_away, team_home, score_away, score_home)?
You can almost generate the suggested change from the original tables using a self join on outcome table:
select o1.team_id team_id_1,
o2.team_id team_id_2,
o1.score score_1,
o2.score score_2,
o1.match_id match_id
from outcome o1
inner join outcome o2 on o1.match_id = o2.match_id and o1.team_id < o2.team_id
Of course, the information for home and away are not possible to generate, so your suggested alternative approach might be better after all. Also, take note of the condition o1.team_id < o2.team_id, which gets rid of the redundant symmetric match data (actually it gets rid of the same outcome row being joined with itself as well, which can be seen as the more important aspect).
In any case, using this select as part of your join, you can generate one row per match.
you fetch 2 rows for every matches played but team_id and team_name are differents :
- one for team home
- one for team away
so your query is good
Using the match table as you describe captures the logic of a game simply and naturally and additionally shows home and away teams which your initial model does not.
You might want to add the round id as a foreign key to round table and perhaps a flag to indicate a match abandoned situation.
drop outcome. it shouldn't be a separate table, because you have exactly one outcome per match.
you may consider how to handle matches that are cancelled - perhaps scores are null?

How can I compare two tables and delete on matching fields (not matching records)

Scenario: A sampling survey needs to be performed on membership of 20,000 individuals. Survey sample size is 3500 of the total 20000 members. All membership individuals are in table tblMember. Same survey was performed the previous year and members whom were surveyed are in tblSurvey08. Membership data can change over the year (e.g. new email address, etc.) but the MemberID data stays the same.
How do I remove the MemberID/records contained tblSurvey08 from tblMember to create a new table of potential members to be surveyed (lets call it tblPotentialSurvey09). Again the record for a individual member may not match from the different tables but the MemberID field will remain constant.
I am fairly new at this stuff but I seem to be having a problem Googling a solution - I could use the EXCEPT function but the records for the individuals members are not necessarily the same from one table to next - just the MemberID may be the same.
Thanks
SELECT
* (replace with column list)
FROM
member m
LEFT JOIN
tblSurvey08 s08
ON m.member_id = s08.member_id
WHERE
s08.member_id IS NULL
will give you only members not in the 08 survey. This join is more efficient than a NOT IN construct.
A new table is not such a great idea, since you are duplicating data. A view with the above query would be a better choice.
I apologize in advance if I didn't understand your question but I think this is what you're asking for. You can use the insert into statement.
insert into tblPotentialSurvey09
select your_criteria from tblMember where tblMember.MemberId not in (
select MemberId from tblSurvey08
)
First of all, I wouldn't create a new table just for selecting potential members. Instead, I would create a new true/false (1/0) field telling if they are eligible.
However, if you'd still want to copy data to the new table, here's how you can do it:
INSERT INTO tblSurvey00 (MemberID)
SELECT MemberID
FROM tblMember m
WHERE NOT EXISTS (SELECT 1 FROM tblSurvey09 s WHERE s.MemberID = m.MemberID)
If you just want to create a new field as I suggested, a similar query would do the job.
An outer join should do:
select m_09.MemberID
from tblMembers m_09 left outer join
tblSurvey08 m_08 on m_09.MemberID = m_08.MemberID
where
m_08.MemberID is null