Do I misunderstand joins? - sql

I'm trying to learn the the ansi-92 SQL standard, but I don't seem to understand it completely (I'm new to the ansi-89 standard as well and in databases in general).
In my example, I have three tables kingdom -< family -< species (biology classifications).
There may be kingdoms without species nor families.
There may be families without species nor kindgoms.
There may be species without kingdom or families.
Why this may happen?
Say a biologist, finds a new species but he has not classified this into a kingdom or family, creates a new family that has no species and is not sure about what kingdom it should belong, etc.
here is a fiddle (see the last query): http://sqlfiddle.com/#!4/015d1/3
I want to make a query that retrieves me every kingdom, every species, but not those families that have no species, so I make this.
select *
from reino r
left join (
familia f
right join especie e
on f.fnombre = e.efamilia
and f.freino = e.ereino
) on r.rnombre = f.freino
and r.rnombre = e.ereino;
What I think this would do is:
join family and species as a right join, so it brings every species, but not those families that have no species. So, if a species has not been classified into a family, it will appear with null on family.
Then, join the kingdom with the result as a left join, so it brings every kingdom, even if there are no families or species classified on that kingdom.
Am I wrong? Shouldn't this show me those species that have not been classified? If I do the inner query it brings what I want. Is there a problem where I'm grouping things?

You're right on your description of #1... the issue with your query is on step #2.
When you do a left join from kingdom to (family & species), you're requesting every kingdom, even if there's no matching (family & species)... however, this won't return you any (family & species) combination that doesn't have a matching kingdom.
A closer query would be:
select *
from reino r
full join (
familia f
right join especie e
on f.fnombre = e.efamilia
and f.freino = e.ereino
) on r.rnombre = f.freino
and r.rnombre = e.ereino;
Notice that the left join was replaced with a full join...
however, this only returns families that are associated with a species... it doesn't return any families that are associated with kingdoms but not species.
After re-reading your question, this is actually want you wanted...
EDIT: On further thought, you could re-write your query like so:
select *
from
especie e
left join familia f
on f.fnombre = e.efamilia
and f.freino = e.ereino
full join reino r
on r.rnombre = f.freino
and r.rnombre = e.ereino;
I think this would be preferrable, because you eliminate the RIGHT JOIN, which are usually frowned upon for being poor style... and the parenthesis, which can be tricky for people to parse correctly to determine what the result will be.

In case this helps:
Relationally speaking, [OUTER JOIN is] a kind of shotgun marriage: It
forces tables into a kind of union—yes, I do mean union, not join—even
when the tables in question fail to conform to the usual requirements
for union. It does this, in effect, by padding one or
both of the tables with nulls before doing the union, thereby making
them conform to those usual requirements after all. But there's no
reason why that padding shouldn't be done with proper values instead
of nulls, as in this example:
SELECT SNO , PNO
FROM SP
UNION
SELECT SNO , 'nil' AS PNO
FROM S
WHERE SNO NOT IN ( SELECT SNO FROM SP )
The above is equivalent to:
SELECT SNO , COALESCE ( PNO , 'nil' ) AS PNO
FROM S NATURAL LEFT OUTER JOIN SP
Source:
SQL and Relational Theory: How to Write Accurate SQL Code By C. J. Date

If you want the query rewritten with only the slightest change from what you have, you can change the LEFT join to a FULL join. You can further remove the redundant parenthesis and the r.rnombre = f.freino from the ON condition:
select *
from reino r
full join --- instead of LEFT JOIN
familia f
right join especie e
on f.fnombre = e.efamilia
and f.freino = e.ereino
on r.rnombre = e.ereino;
---removed the: r.rnombre = f.freino

Try to use this:
select *
from reino r
join especie e on (r.rnombre = e.ereino)
join familia f on (f.freino = e.ereino and f.fnombre = e.efamilia)
could it be, that you interchanged efamilia and enombre in table especie?

Related

How to combine taking several joins and add constraints on query?

How can I answer the following question by quering this database:
The police is looking for a brown hair coloured woman that checked in in the gym somewhere between september 8th 2016 and october 24th 2016.This woman has a silver membership. Can we find the name of this woman?
I tried the following query:
dbGetQuery(db,"
SELECT *
FROM get_fit_now_member
JOIN get_fit_now_check_in ON id = membership_id
WHERE check_in_date BETWEEN '20160909' AND '20161023' AND membership_status = 'silver'
")
This gives me the following output:
The problem is that I have to join multiple times and at the same time have to add different constrains. How can I solve this question in a clever way?
Here's how I would write the query:
SELECT m.name
FROM get_fit_now_member AS m
JOIN get_fit_now_check_in AS c ON m.id = c.membership_id
JOIN person AS p ON m.person_id = p.id
JOIN drivers_license AS d ON p.license_id = d.id
WHERE c.check_in_date BETWEEN '20160908' AND '20161024'
AND m.membership_status = 'silver'
AND d.hair_color = 'brown';
JOIN is just an operator, like + is in arithmetic. In arithmetic, you can extend the expressions with more terms, like a + b + c + d. In SQL, you can use JOIN multiple times in a similar way.
I used correlation names (m, c, p, d) to make it more convenient to qualify the table names, so I can be clear for example which id I mean in each join condition, since a column named id exists in multiple tables.
I also changed the date expression, because I assume "between" is meant to include the two dates named in the problem statement.

How to do multiple left joins in SQL Server SQL Query

I'm trying to perform the query below on SQL Server:
SELECT
[dbo].[Machine].[MachineID],
[dbo].[Machine].[CompanyID],
[dbo].[Company].[AccountRef],
[dbo].[Machine].[ProductTypeID],
[dbo].[Machine].[SerialNo],
[dbo].[Machine].[InstallationDate],
[dbo].[Machine].[SalesTypeID],
[dbo].[SalesType].[SalesType],
[dbo].[Machine].[LeasingCompanyID],
[dbo].[LeasingCompany].[Name],
[dbo].[Machine].[QuarterlyRentalCost],
[dbo].[Machine].[Term],
[dbo].[Machine].[ExpiryDate],
[dbo].[Machine].[Scales],
[dbo].[Machine].[Chips],
[dbo].[Machine].[ContractTypeID],
[dbo].[ContractType].[ContractType],
[dbo].[Machine].[ContractCost],
[dbo].[Machine].[InvoiceDate],
[dbo].[Machine].[ServiceDueDate],
[dbo].[Machine].[ServiceNotes],
[dbo].[Machine].[modelID],
[dbo].[Machine].[Model],
[dbo].[Machine].[IMP_Machine Reference],
[dbo].[Machine].[Smart]
FROM
[dbo].[Machine], [dbo].[Company], [dbo].[SalesType], [dbo].[LeasingCompany], [dbo].[ContractType]
LEFT JOIN
[dbo].[Machine] as A ON A.[CompanyID] = [dbo].[Company].[CompanyID]
LEFT JOIN
[dbo].[Machine] as B ON B.[SalesTypeID] = [dbo].[SalesType].[SalesTypeID]
LEFT JOIN
[dbo].[Machine] as C ON C.[LeasingCompanyID] = [dbo].[LeasingCompany].[LeasingCompanyID]
LEFT JOIN
[dbo].[Machine] as D ON D.[ContractTypeID] = [dbo].[ContractType].[ContractTypeID] ;
But for some reason that i cannot see for the life of me, the destination column name in the bottom 3 join statements is reporting "The multi part identifier could not be bound".
Could anyone assist please?
Many Thanks,
You were pretty close, for readability, you dont need the extensive [dbo].[table] all over. You can define it once in the from clause with an alias, then use that alias the rest of the way through. Also, make the alias make sense to the context of the table as you will see in this below. LeasingCompany lc, SalesType st, etc.
Also, I try to have indented so I always know the first table of the join relationship and the JOIN indented on where it is being joined to. Then I keep the orientation of the from/to table aliases the same context.
Since the machine table is used once and joined to 4 different lookup tables, you can reuse the same "m" alias to each underlying. Think of a tree and branches. The root tree is your "Machine", and all the branches are the lookups.
SELECT
m.MachineID,
m.CompanyID,
c.AccountRef,
m.ProductTypeID,
m.SerialNo,
m.InstallationDate,
m.SalesTypeID,
st.SalesType,
m.LeasingCompanyID,
lc.LeasingCompany.Name,
m.QuarterlyRentalCost,
m.Term,
m.ExpiryDate,
m.Scales,
m.Chips,
m.ContractTypeID,
ct.ContractType,
m.ContractCost,
m.InvoiceDate,
m.ServiceDueDate,
m.ServiceNotes,
m.modelID,
m.Model,
m.IMP_Machine Reference,
m.Smart
FROM
Machine m
LEFT JOIN Company c
ON m.CompanyID = c.CompanyID
LEFT JOIN SalesType st
ON m.SalesTypeID = st.SalesTypeID
LEFT JOIN LeasingCompany lc
ON m.LeasingCompanyID = lc.LeasingCompanyID
LEFT JOIN ContractType ct
ON m.ContractTypeID = ct.ContractTypeID ;

Nesting multiple same select queries and reuse without second round trip to database

I am having a rows with two different IDs in database. Now I am trying to show two different data columns in one row, I tried something like this:
SELECT
[dbo].[fnHexToNumber]([Participant].[Stake]) AS [PlayerStake],
(SELECT [dbo].[fnHexToNumber]([Stake])
FROM [dbo].[Participant_Complete]
WHERE [ParticipantId] = [Fold].[HouseParticipantId]) AS [HouseStake],
([dbo].[fnHexToNumber]([Participant].[Stake]) + [dbo].[fnHexToNumber]([C].[RunningWinLoss])) AS [PlayerStakeAfterRound],
(SELECT [dbo].[fnHexToNumber]([Stake])
FROM [dbo].[Participant_Complete]
WHERE [ParticipantId] = [Fold].[HouseParticipantId]) - [dbo].[fnHexToNumber]([C].[RunningWinLoss]) AS [HouseStakeAfterRound]
FROM
[dbo].[Round_Complete] AS [C]
INNER JOIN
[dbo].[Fold_Complete] AS [Fold] ON [Fold].[Id] = [C].[Id]
INNER JOIN
[dbo].[Participant_Complete] AS [Participant] ON [Participant].[ParticipantId] = [Fold].[PlayerParticipantId]
This works, but as you can see it will do two trips to database for same nested select. How can I make this only one round trip?
You are referring to the subqueries. That is not a "round trip to the database", which usually refers to an application calling a query.
All the square braces make the query hard to read, but you can fix this using apply:
SELECT [dbo].[fnHexToNumber](p.[Stake]) AS PlayerStake,
h.HouseStake,
([dbo].[fnHexToNumber](p.[Stake]) + [dbo].[fnHexToNumber]([C].RunningWinLoss)) AS PlayerStakeAfterRound,
(h.HouseStake - [dbo].fnHexToNumber(C.RunningWinLoss)) AS HouseStakeAfterRound
FROM [dbo].[Round_Complete] c JOIN
[dbo].[Fold_Complete] f
ON f.[Id] = c.[Id] JOIN
[dbo].[Participant_Complete] pc
ON px.[ParticipantId] = f.[PlayerParticipantId] OUTER APPLY
(SELECT [dbo].[fnHexToNumber]([Stake]) as HouseStake
FROM [dbo].[Participant_Complete] pch
WHERE pch.ParticipantId = f.HouseParticipantId
) h
Just join the same table a second time instead of pulling the data as sub-queries.
Also, you only need brackets around names if the names contain a space (which is bad practice in general). If the names don't have a space, the brackets are totally extraneous.
SELECT
dbo.fnHexToNumber(Participant.Stake) AS PlayerStake,
dbo.fnHexToNumber(p.Stake) as HouseStake,
(dbo.fnHexToNumber(Participant.Stake) + dbo.fnHexToNumber(C.RunningWinLoss)) AS PlayerStakeAfterRound,
dbo.fnHexToNumber(p.Stake) - dbo.fnHexToNumber(c.RunningWinLoss) as HouseStakeAfterRound
FROM dbo.Round_Complete AS C
INNER JOIN dbo.Fold_Complete AS Fold
ON Fold.Id = C.Id
INNER JOIN dbo.Participant_Complete AS Participant
ON Participant.ParticipantId = Fold.PlayerParticipantId
INNER JOIN dbo.Participant_Complete AS p
ON p.ParticipantId = Fold.HouseParticipantId

Joining a selected table to a cross joined table

I have a table, flight_schedule, that consists of a bunch of flight segments. Below I have a Terradata SQL query that creates a list of two segment itineraries between Chicago and Denver. i.e. each row has two flights that eventually get the passenger from Chicago to Denver. For example, the first row contains flight information on a leg from Chicago to Omaha and then a later leg from Omaha to Denver. This query works just fine.
SELECT A.flt_num, A.dprt_sta_cd, A.arrv_sta_cd, A.sch_dprt_dtml, A.sch_arrv_dtml,
B.flt_num, B.dprt_sta_cd, B.arrv_sta_cd, B.sch_dprt_dtml, B.sch_arrv_dtml
FROM
flight_schedule A
CROSS JOIN
flight_schedule B
WHERE
A.dprt_sta_cd = 'Chicago' AND
B.arrv_sta_cd = 'Denver' AND
A.arrv_sta_cd = B.dprt_sta_cd AND
A.sch_arrv_dtml < B.sch_dprt_dtml
ORDER BY B.sch_arrv_dtml;
I have another table, flight_seat_inventory, that consists of seats available in different cabins for each flight number. The query below aggregates total available seats for each flight number. This query is also A-OK.
SELECT flt_num, SUM(seat_cnt) as avail_seats
FROM flight_seat_inventory
GROUP BY flt_num;
I want to combine these two queries with a LEFT JOIN, twice, so that each flight has a corresponding avail_seats value. How can I do this?
For added clarity, I think my desired Select statement looks like this:
SELECT A.flt_num, A.dprt_sta_cd, A.arrv_sta_cd, A.sch_dprt_dtml, A.sch_arrv_dtml, C.avail_seats
B.flt_num, B.dprt_sta_cd, B.arrv_sta_cd, B.sch_dprt_dtml, B.sch_arrv_dtml, D.avail_seats
flight_schedule is HUGE, so I suspect it's more efficient to do the LEFT JOIN after the CROSS JOIN. Again, using Teradata SQL.
Thanks!
I needed to declare the second seats query as a temporary table using a WITH command before I did the LEFT JOIN:
WITH tempSeatsTable AS (
SELECT flt_num, SUM(seat_cnt) as avail_seats
FROM flight_seat_inventory
GROUP BY flt_num
)
SELECT
A.flt_num, A.dprt_sta_cd, A.arrv_sta_cd, A.sch_dprt_dtml, A.sch_arrv_dtml, C.avail_seats
B.flt_num, B.dprt_sta_cd, B.arrv_sta_cd, B.sch_dprt_dtml, B.sch_arrv_dtml, D.avail_seats
FROM
flight_schedule A
CROSS JOIN
flight_schedule B
LEFT JOIN
tempSeatsTable C
ON A.flt_num = C.flt_num
LEFT JOIN
tempSeatsTable D
ON B.flt_num = D.flt_num
WHERE
A.dprt_sta_cd = 'Chicago' AND
B.arrv_sta_cd = 'Denver' AND
A.arrv_sta_cd = B.dprt_sta_cd AND
A.sch_arrv_dtml < B.sch_dprt_dtml
ORDER BY B.sch_arrv_dtml;

When to add joins to a query

Background
We have a bunch of Java code which generates an SQL query. When generating the conditions for the WHERE part, we are having some difficulties inducing the necessary joins.
The purpose of the query is always to return the zoo's id (see Database Structure).
Database Structure
Context
In the UI the user defines separately the filtering logic and the WHERE clause pieces. An example
Filtering logic: "1 OR (2 AND 3) OR (4 AND 5)"
WHERE clause pieces:
1: animal.name = "jack"
2: zoo.city = "Los Angeles"
3: animal.species = "bear"
4: animal.species = "fish"
5: animal.name = "henrietta"
Problem
The problem is that depending on the filtering logic and the WHERE clause pieces, extra joins might be necessary. At the very least new join is required when you have a query with like "(species = A AND name = B) AND (species = C AND name = D)" (both species and name columns are compared to different values while being in AND relation). Note that this query only makes sense because it is suppose to return the zoo's id where all of these conditions are true.
So my problem is that I do not know how many joins are absolutely necessary. I don't know the logic with which I will arrive at the answer. Because an extra join is not necessary if you have a query like "(species = A OR species = B)".
Expected Behavior
The following query should return the IDs of all zoos where there are two animals with names jack and tanya.
SELECT zoo.id FROM zoo
INNER JOIN animal a1 ON zoo.id = a1.zoo_id
INNER JOIN animal a2 ON zoo.id = a2.zoo_id
WHERE
a1.name = "jack"
AND a2.name = "tanya";
This query illustrates the need for two joins. The next one illustrates the case where only one join is sufficient.
SELECT zoo.id FROM zoo
INNER JOIN animal ON zoo.id = animal.zoo_id
WHERE
animal.name = "jack"
OR animal.name = "tanya";
Naive Solution
The simplest possible solution is to add one join for each animal table reference but there are serious performance consequences; increase of thousands of percents when compared to minimal joins.
This baseline for any query of what you have would be as follows...
SELECT
a.species,
a.name,
z.city
from
animals a
join zoo z
ON a.zoo_id = z.id
Then, just add your query criteria into the WHERE clause, such as
where
( a.name = 'jack' )
OR ( a.species = 'bear' AND z.city = 'Los Angeles' )
OR ( a.species = 'fish' AND a.name = 'henrietta' )
It goes through all the animals entries ONCE and pulls out those that qualify. I would also have an index on animals by (species, name), and the zoo table indexed on (id, city)
You don't need extra joins.
Just join on the Zoo ID and you can add to the end of your statement "WHERE...."
EDIT: Are you doing "JOIN ON" ? If so, take the filter out of the join and just put it at the end.