When to add joins to a query - sql

Background
We have a bunch of Java code which generates an SQL query. When generating the conditions for the WHERE part, we are having some difficulties inducing the necessary joins.
The purpose of the query is always to return the zoo's id (see Database Structure).
Database Structure
Context
In the UI the user defines separately the filtering logic and the WHERE clause pieces. An example
Filtering logic: "1 OR (2 AND 3) OR (4 AND 5)"
WHERE clause pieces:
1: animal.name = "jack"
2: zoo.city = "Los Angeles"
3: animal.species = "bear"
4: animal.species = "fish"
5: animal.name = "henrietta"
Problem
The problem is that depending on the filtering logic and the WHERE clause pieces, extra joins might be necessary. At the very least new join is required when you have a query with like "(species = A AND name = B) AND (species = C AND name = D)" (both species and name columns are compared to different values while being in AND relation). Note that this query only makes sense because it is suppose to return the zoo's id where all of these conditions are true.
So my problem is that I do not know how many joins are absolutely necessary. I don't know the logic with which I will arrive at the answer. Because an extra join is not necessary if you have a query like "(species = A OR species = B)".
Expected Behavior
The following query should return the IDs of all zoos where there are two animals with names jack and tanya.
SELECT zoo.id FROM zoo
INNER JOIN animal a1 ON zoo.id = a1.zoo_id
INNER JOIN animal a2 ON zoo.id = a2.zoo_id
WHERE
a1.name = "jack"
AND a2.name = "tanya";
This query illustrates the need for two joins. The next one illustrates the case where only one join is sufficient.
SELECT zoo.id FROM zoo
INNER JOIN animal ON zoo.id = animal.zoo_id
WHERE
animal.name = "jack"
OR animal.name = "tanya";
Naive Solution
The simplest possible solution is to add one join for each animal table reference but there are serious performance consequences; increase of thousands of percents when compared to minimal joins.

This baseline for any query of what you have would be as follows...
SELECT
a.species,
a.name,
z.city
from
animals a
join zoo z
ON a.zoo_id = z.id
Then, just add your query criteria into the WHERE clause, such as
where
( a.name = 'jack' )
OR ( a.species = 'bear' AND z.city = 'Los Angeles' )
OR ( a.species = 'fish' AND a.name = 'henrietta' )
It goes through all the animals entries ONCE and pulls out those that qualify. I would also have an index on animals by (species, name), and the zoo table indexed on (id, city)

You don't need extra joins.
Just join on the Zoo ID and you can add to the end of your statement "WHERE...."
EDIT: Are you doing "JOIN ON" ? If so, take the filter out of the join and just put it at the end.

Related

Conditional join that changes number of join conditions

I am trying to join data based on the following scenario.
Let's say there are two businesses. Business 1 has one field for customer data, business 2 has two fields. I need to join to multiple other tables using these customer fields.
I would like to create a join that joins on just field 1 for business 1, but field 1 AND field 2 for business 2. In other words, there is a more granular identifier available for business 2, but it is still valid to join on just field 1 for business 1 as well. It also needs to function like an inner join, in that we are only preserving the relevant data that match these conditions.
The code would look something like this for business 1:
FROM customer_data a
INNER JOIN marketing_data b
ON a.member_number = b.member_number
WHERE business_number = 1
And something like this for business 2:
FROM customer_data a
INNER JOIN marketing_data b
ON a.member_number = b.member_number
AND a.sub_member_number = b.sub_member_number
WHERE business_number = 2
I am hoping to extract both sets of data in one join statement. Also, just in case it helps, I am using the Snowflake platform to write my queries.
Following should work for both the cases.
FROM customer_data a
INNER JOIN marketing_data b ON a.member_number = b.member_number
WHERE (
a.sub_member_number = b.sub_member_number
AND business_number = 2
)
OR business_number = 1
You can put the conditions in the ON clause like this:
FROM customer_data cd INNER JOIN
marketing_data md
ON cd.member_number = md.member_number AND
( cd.business_number <> 2 OR
cd.sub_member_number = md.sub_member_number
)
Note: this generalizes beyond just businesses 1 and 2, with the special condition only applying to 2. The first condition can be = 1 if you want to be more specific.
Also note that this introduces meaningful table aliases rather than arbitrary letters. This makes queries much easier to understand.

Join 3 tables using column common to all 3 tables

I am a total SQL novice, so please bear with me. I have three tables that are set up in the following fashion:
date|country|Test 1|Test 2|Test 3|etc.
The data in the date and country columns are identical across the three tables, and the differences are in the data in the Test columns. I'd like to use Join to query one date column and the three corresponding Test columns from the three tables.
I'm planning on just re-building the table so that the Test columns in the other tables are additional columns in the one table, but I'd still like to know how to use Join in this way. This is what I have at the moment, although it's throwing an error saying that there's an error in the syntax of the FROM clause. It's worth noting that I'm running this query in VBA using an Access DB.
SELECT r.CRDate, r.Test, p.Test, z.Test
FROM CountryRaw as r
INNER JOIN CountryPct as p ON p.CPctDate = r.CRDate
INNER JOIN CountryZ as z ON z.CZDate = p.CPctDate
WHERE r.Country = 'US' AND p.Country = 'US' AND z.Country = 'US'
I came across something using SELECT COALESCE(r.CRDate, p.CPctDate, z.CZDate) to start, but I didn't get anywhere with that.
MS Access requires extra parentheses. So try this:
SELECT r.CRDate, r.Test, p.Test, z.Test
FROM (CountryRaw as r INNER JOIN
CountryPct as p
ON p.CPctDate = r.CRDate
) INNER JOIN
CountryZ as z
ON z.CZDate = p.CPctDate
WHERE r.Country = 'US' AND p.Country = 'US' AND z.Country = 'US'

Changing old join to new join

I am new to SQL so any help is greatly appreciated. I have a query that seems to be working that has old style joins, and I need to change it to new style joins. the current query is like:
SELECT
STAR.V_DISASTER_DIMENSIONS .DISASTER_NUMBER,
STAR.PA_PROJECT_DIMENSIONS .PW_NUMBER,
STAR.PA_PROJECT_SITE_DIMENSIONS.SITE_NUMBER,
STAR.PA_PROJECT_FACTS .PROJECT_AMOUNT,
STAR.PA_MITIGATION_DIMENSIONS .MITIGATION_ACTIVITY_STATUS
FROM
STAR.V_DISASTER_DIMENSIONS,
STAR.PA_PROJECT_DIMENSIONS,
STAR.PA_PROJECT_SITE_DIMENSIONS,
STAR.PA_MITIGATION_DIMENSIONS,
STAR.PA_PROJECT_FACTS,
STAR.PA_PROJECT_SITE_FACTS
WHERE
( STAR.PA_PROJECT_DIMENSIONS.PA_PROJECT_ID = STAR.PA_PROJECT_FACTS.PA_PROJECT_ID )
AND
( STAR.PA_PROJECT_FACTS.DISASTER_ID = STAR.V_DISASTER_DIMENSIONS.DISASTER_ID )
AND
( STAR.PA_MITIGATION_DIMENSIONS.PA_MITIGATION_ID = STAR.PA_PROJECT_FACTS.PA_PROJECT_ID )
AND
( STAR.PA_PROJECT_SITE_FACTS.PA_PROJECT_ID = STAR.PA_MITIGATION_DIMENSIONS.PA_MITIGATION_ID )
AND
( STAR.PA_PROJECT_SITE_FACTS.DISASTER_ID = STAR.V_DISASTER_DIMENSIONS.DISASTER_ID )
AND
( STAR.PA_PROJECT_SITE_FACTS.PA_PROJECT_ID = STAR.PA_PROJECT_DIMENSIONS.PA_PROJECT_ID )
AND
( STAR.PA_PROJECT_SITE_FACTS.PA_PROJECT_SITE_ID = STAR.PA_PROJECT_SITE_DIMENSIONS.PA_PROJECT_SITE_ID )
My attempt to convert is below. I don't know where to put the extra conditions because they are not 1 to 1 with tables.
FROM
STAR.V_DISASTER_DIMENSIONS
JOIN STAR.PA_PROJECT_SITE_FACTS ON STAR.PA_PROJECT_SITE_FACTS.DISASTER_ID = STAR.V_DISASTER_DIMENSIONS.DISASTER_ID
JOIN STAR.PA_PROJECT_DIMENSIONS ON STAR.PA_PROJECT_SITE_FACTS.PA_PROJECT_ID = STAR.PA_PROJECT_DIMENSIONS.PA_PROJECT_ID
JOIN STAR.PA_PROJECT_SITE_DIMENSIONS ON STAR.PA_PROJECT_SITE_FACTS.PA_PROJECT_SITE_ID = STAR.PA_PROJECT_SITE_DIMENSIONS.PA_PROJECT_SITE_ID
JOIN STAR.PA_MITIGATION_DIMENSIONS ON STAR.PA_PROJECT_SITE_FACTS.PA_PROJECT_ID = STAR.PA_MITIGATION_DIMENSIONS.PA_MITIGATION_ID
JOIN STAR.PA_PROJECT_FACTS ON (
STAR.PA_PROJECT_FACTS .DISASTER_ID = STAR.V_DISASTER_DIMENSIONS.DISASTER_ID AND
STAR.PA_MITIGATION_DIMENSIONS.PA_MITIGATION_ID = STAR.PA_PROJECT_FACTS .PA_PROJECT_ID AND
STAR.PA_PROJECT_DIMENSIONS .PA_PROJECT_ID = STAR.PA_PROJECT_FACTS .PA_PROJECT_ID
)
Change , to INNER JOINs with ON condition:
SELECT
STAR.V_DISASTER_DIMENSIONS.DISASTER_NUMBER,
STAR.PA_PROJECT_DIMENSIONS.PW_NUMBER,
STAR.PA_PROJECT_SITE_DIMENSIONS.SITE_NUMBER,
STAR.PA_PROJECT_FACTS.PROJECT_AMOUNT,
STAR.PA_MITIGATION_DIMENSIONS.MITIGATION_ACTIVITY_STATUS
FROM
STAR.PA_PROJECT_DIMENSIONS PD
INNER JOIN STAR.PA_PROJECT_FACTS PF ON PD.PA_PROJECT_ID=PF.PA_PROJECT_ID
INNER JOIN STAR.V_DISASTER_DIMENSIONS DD ON DD.DISASTER_ID=PF.DISASTER_ID
INNER JOIN STAR.PA_MITIGATION_DIMENSIONS MD ON MD.PA_MITIGATION_ID=PF.PA_PROJECT_ID
INNER JOIN STAR.PA_PROJECT_SITE_FACTS PSF ON PSF.PA_PROJECT_ID=MD.PA_MITIGATION_ID
AND PSF.DISASTER_ID=DD.DISASTER_ID
AND PSF.PA_PROJECT_ID=PD.PA_PROJECT_ID
INNER JOIN STAR.PA_PROJECT_SITE_DIMENSIONS PSD ON PSD.PA_PROJECT_SITE_ID=PSF.PA_PROJECT_SITE_ID
Select * from
a,b
where a.z = b.y
would be written as
Select * from
a
INNER JOIN
b
ON a.z = b.y
It is easy. Just start with the facts table and join related tables on foreign key = key.
First of all you should use table aliases to get the query more readable. Also use some lowercase letters, too.
Then just write the table names (or the aliases) on paper and draw a line for each condition from one table to the other. Then pick one table to start with, e.g. pa_project_site_dimensions which is only linked to one table.
SELECT
dd.disaster_number,
pd.pw_number,
psd.site_number,
psf.project_amount,
md.mitigation_activity_status
FROM star.pa_project_site_dimensions psd
JOIN star.pa_project_site_facts psf ON psf.pa_project_site_id = psd.pa_project_site_id
JOIN star.v_disaster_dimensions dd ON dd.disaster_id = psf.disaster_id
JOIN star.pa_mitigation_dimensions md ON md.pa_mitigation_id = psf.pa_project_id
JOIN star.pa_project_dimensions pd ON pd.pa_project_id = psf.pa_project_id
JOIN star.pa_project_facts pf ON pf.disaster_id = dd.disaster_id
AND pf.pa_project_id = md.pa_mitigation_id
AND pf.pa_project_id = pd.pa_project_id
;
However, this is a strange query. First of all there is no limiting condition, you simply join all records, rather than retrieving data for, say, one particular project.
Moreover, you deal with several dimensions. Obviously a project has facts (pa_project_facts) and dimensions (pa_project_dimensions). With 5 facts and 3 dimensions you'd get 15 rows with all their combinations. Then there are also project sites it seems (maybe a table pa_project_sites we don't see in the query). Either that project site has facts on its own (pa_project_site_facts) that you also combine with all rows, or a project site is linked to a project fact via pa_project_site_facts, but then pa_project_facts wouldn't have to be joined by pa_project_id only, but also by some fact ID.
Also this looks strange: md.pa_mitigation_id = psf.pa_project_id. Is a mitigation the same as a project?
So after all have a look at all columns that need to be joined on. Think about how the tables are related and if you are not building combinations that make no sense.

How can I do a SQL join to get a value 4 tables farther from the value provided?

My title is probably not very clear, so I made a little schema to explain what I'm trying to achieve. The xxxx_uid labels are foreign keys linking two tables.
Goal: Retrieve a column from the grids table by giving a proj_uid value.
I'm not very good with SQL joins and I don't know how to build a single query that will achieve that.
Actually, I'm doing 3 queries to perform the operation:
1) This gives me a res_uid to work with:
select res_uid from results where results.proj_uid = VALUE order by res_uid asc limit 1"
2) This gives me a rec_uid to work with:
select rec_uid from receptor_results
inner join results on results.res_uid = receptor_results.res_uid
where receptor_results.res_uid = res_uid_VALUE order by rec_uid asc limit 1
3) Get the grid column I want from the grids table:
select grid_name from grids
inner join receptors on receptors.grid_uid = grids.grid_uid
where receptors.rec_uid = rec_uid_VALUE;
Is it possible to perform a single SQL that will give me the same results the 3 I'm actually doing ?
You're not limited to one JOIN in a query:
select grids.grid_name
from grids
inner join receptors
on receptors.grid_uid = grids.grid_uid
inner join receptor_results
on receptor_results.rec_uid = receptors.rec_uid
inner join results
on results.res_uid = receptor_results.res_uid
where results.proj_uid = VALUE;
select g.grid_name
from results r
join resceptor_results rr on r.res_uid = rr.res_uid
join receptors rec on rec.rec_uid = rr.rec_uid
join grids g on g.grid_uid = rec.grid_uid
where r.proj_uid = VALUE
a small note about names, typically in sql the table is named for a single item not the group. thus "result" not "results" and "receptor" not "receptors" etc. As you work with sql this will make sense and names like you have will seem strange. Also, one less character to type!

Do I misunderstand joins?

I'm trying to learn the the ansi-92 SQL standard, but I don't seem to understand it completely (I'm new to the ansi-89 standard as well and in databases in general).
In my example, I have three tables kingdom -< family -< species (biology classifications).
There may be kingdoms without species nor families.
There may be families without species nor kindgoms.
There may be species without kingdom or families.
Why this may happen?
Say a biologist, finds a new species but he has not classified this into a kingdom or family, creates a new family that has no species and is not sure about what kingdom it should belong, etc.
here is a fiddle (see the last query): http://sqlfiddle.com/#!4/015d1/3
I want to make a query that retrieves me every kingdom, every species, but not those families that have no species, so I make this.
select *
from reino r
left join (
familia f
right join especie e
on f.fnombre = e.efamilia
and f.freino = e.ereino
) on r.rnombre = f.freino
and r.rnombre = e.ereino;
What I think this would do is:
join family and species as a right join, so it brings every species, but not those families that have no species. So, if a species has not been classified into a family, it will appear with null on family.
Then, join the kingdom with the result as a left join, so it brings every kingdom, even if there are no families or species classified on that kingdom.
Am I wrong? Shouldn't this show me those species that have not been classified? If I do the inner query it brings what I want. Is there a problem where I'm grouping things?
You're right on your description of #1... the issue with your query is on step #2.
When you do a left join from kingdom to (family & species), you're requesting every kingdom, even if there's no matching (family & species)... however, this won't return you any (family & species) combination that doesn't have a matching kingdom.
A closer query would be:
select *
from reino r
full join (
familia f
right join especie e
on f.fnombre = e.efamilia
and f.freino = e.ereino
) on r.rnombre = f.freino
and r.rnombre = e.ereino;
Notice that the left join was replaced with a full join...
however, this only returns families that are associated with a species... it doesn't return any families that are associated with kingdoms but not species.
After re-reading your question, this is actually want you wanted...
EDIT: On further thought, you could re-write your query like so:
select *
from
especie e
left join familia f
on f.fnombre = e.efamilia
and f.freino = e.ereino
full join reino r
on r.rnombre = f.freino
and r.rnombre = e.ereino;
I think this would be preferrable, because you eliminate the RIGHT JOIN, which are usually frowned upon for being poor style... and the parenthesis, which can be tricky for people to parse correctly to determine what the result will be.
In case this helps:
Relationally speaking, [OUTER JOIN is] a kind of shotgun marriage: It
forces tables into a kind of union—yes, I do mean union, not join—even
when the tables in question fail to conform to the usual requirements
for union. It does this, in effect, by padding one or
both of the tables with nulls before doing the union, thereby making
them conform to those usual requirements after all. But there's no
reason why that padding shouldn't be done with proper values instead
of nulls, as in this example:
SELECT SNO , PNO
FROM SP
UNION
SELECT SNO , 'nil' AS PNO
FROM S
WHERE SNO NOT IN ( SELECT SNO FROM SP )
The above is equivalent to:
SELECT SNO , COALESCE ( PNO , 'nil' ) AS PNO
FROM S NATURAL LEFT OUTER JOIN SP
Source:
SQL and Relational Theory: How to Write Accurate SQL Code By C. J. Date
If you want the query rewritten with only the slightest change from what you have, you can change the LEFT join to a FULL join. You can further remove the redundant parenthesis and the r.rnombre = f.freino from the ON condition:
select *
from reino r
full join --- instead of LEFT JOIN
familia f
right join especie e
on f.fnombre = e.efamilia
and f.freino = e.ereino
on r.rnombre = e.ereino;
---removed the: r.rnombre = f.freino
Try to use this:
select *
from reino r
join especie e on (r.rnombre = e.ereino)
join familia f on (f.freino = e.ereino and f.fnombre = e.efamilia)
could it be, that you interchanged efamilia and enombre in table especie?