Joins with WHERE - splitting WHERE clauses - sql

I solved the query at this link
Can you return a list of characters and TV shows that are not named "Willow Rosenberg" and not in the show "How I Met Your Mother"?
with the following code:
SELECT ch.name,sh.name
FROM character ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE ch.name != "Willow Rosenberg" AND sh.name !="How I Met Your Mother"
;
However, my first try was:
SELECT ch.name,sh.name
FROM character ch
WHERE ch.name != "Willow Rosenberg" /*This here*/
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
;
because I thought that in this way only the table character would have been filtered before doing the joins and, therefore, it would have been less computationally heavy.
Does it make any sense?
Is there a way to "split" the WHERE clause when joining multiple tables?

Think of JOINs as a cross-product of two tables, which is filtered using the conditions specified in the ON clause. Your WHERE clause is then applied on the result set, and not on the individual tables participating in the join.
If you want to apply WHERE on only one of the joined tables, you'll have to use a sub-query. The filtered result of that sub-query will then be treated as a normal table and joined with a real table using JOIN again.
If you are doing this for performance, remember though that a join is almost always faster on standard JOINs compared to sub-queries, for properly indexed tables. You'll find that queries using JOIN will be orders of magnitude faster than the ones using sub-queries, except for rare cases.

You can using subqueries
SELECT ch.name,sh.name
FROM (
SELECT ch.name
FROM character ch
WHERE ch.name != "Willow Rosenberg") ch
INNER JOIN character_tv_show chat
ON ch.id = chat.character_id
INNER JOIN tv_show sh
ON chat.tv_show_id=sh.id
WHERE sh.name !="How I Met Your Mother"
but i think it don't have sense. subqueries will make temp table.
First query will be optimized by database server, and likely select only rows from character table that need

JOIN and WHERE clauses are not necessarily executed in the order you write them. In general, the query optimizer will rearrange things to make them as efficient as possible (or at least what it thinks is most efficient), so adding a second WHERE clause wouldn't be any different from adding another AND condition (which is why it's not allowed).
Your idea wasn't bad, but it's just not how databases actually work.

A SELECT can only have 1 WHERE clause.
And it comes after the JOIN's.
But you can have additional WHERE clauses in the sub-queries you join.
And sometimes a criteria that you've added to a WHERE clause can be moved to the ON of a JOIN.
For example the queries below would return the same results
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t1.Col1 = 'foo'
AND t2.Col1 = 'bar'
SELECT *
FROM
(
SELECT *
FROM Table1
WHERE Col1 = 'foo'
) AS t1
JOIN Table2 AS t2 ON t2.ID = t1.table2ID
WHERE t2.Col1 = 'bar'
SELECT *
FROM Table1 AS t1
JOIN Table2 AS t2 ON (t2.ID = t1.table2ID AND t2.Col1 = 'bar')
WHERE t1.Col1 = 'foo'

Related

sql left join explanation

Found this code example online regarding sql left joins and I want to make sure i get it correctly ( since I am no expert )
SELECT table1.column1, table2.column2...
FROM table1
LEFT JOIN table2
ON table1.common_field = table2.common_field AND table1.common_field_2 = table2.common_field_2
WHERE table1.column3 = ... AND table2.common_field IS NULL
My question comes for the AND table2.common_field IS NULL part and how it affects the ON above.
For me it seems that join result will contain only those that they exist on table1, but not on table2 based on the common_field.
Is that correct? Can it be written simpler since the above seems confusing to me.
The first step in any SQL development is to check the data that is actually stored in the tables you intend to use in your query.
How the data is stored will affect the results of the query, particularly when filtering for NULLs or checking for the existence of a row.
Using EXISTS or NOT EXISTS to check for existence/non-existence of one or more rows is very effective, providing the WHERE clause within the EXISTS sub-query doesn't have conflicting logic (e.g. NOT EXISTS and <> are used together), which can be confusing and produce results that are difficult to test.
Does table2.common_field contain any NULLs? If it does, it would be wise to filter on those in a nested query, CTE or view first, then use the results of that in the main query.
If table2.common_field doesn't contain NULLs or has a NOT NULL constraint, then perhaps you are using table2.common_field IS NULL to filter on the results of the LEFT JOIN, where there is no match on the join criteria for table2. If this is the case and you want to stick with using LEFT JOIN, I recommend to nest your query and filter on the NULL in the outer query.
Here's a couple of options:
Option 1: Use LEFT JOIN, filter on NULL in the outer query.
Note the careful use of an alias for table2.common_field which is important.
SELECT
result.*
FROM
(
SELECT table1.column1, table2.column2, table2.common_field as table2_common_field...
FROM table1
LEFT JOIN table2
ON table1.common_field = table2.common_field AND table1.common_field_2 = table2.common_field_2
WHERE table1.column3 = ...
) result
WHERE result.table2_common_field IS NULL;
Option 2 (recommended): Use NOT EXISTS.
SELECT table1.column1, table2.column2...
FROM table1
WHERE NOT EXISTS (
select 1
from table2
where table2.common_field = table1.common_field
AND table2.common_field_2 = table1.common_field_2
)
AND table1.column3 = ...

Is there a logical difference between putting a condition in the ON clause of an inner join versus the where clause of the main query?

Consider these two similar SQLs
(condition in ON clause)
select t1.field1, t2.field1
from
table1 t1 inner join table2 t2 on t1.id = t2.id and t1.boolfield = 1
(condition in WHERE clause)
select t1.field1, t2.field1
from
table1 t1 inner join table2 t2 on t1.id = t2.id
where t1.boolfield = 1
I have tested this out a bit and I can see the difference between putting a condition in the two different places for an outer join.
But in the case of an inner join can the result sets ever be different?
For INNER JOIN, there is no effective difference, although I think the second option is cleaner.
For LEFT JOIN, there is a huge difference. The ON clause specifies which records will be selected from the tables for comparison and the WHERE clause filters the results.
Example 1: returns all the rows from tbl 1 and matches them up with appropriate rows from tbl2 that have boolfield=1
Select *
From tbl1
LEFT JOIN tbl2 on tbl1.id=tbl2.id and tbl2.boolfield=1
Example 2: will only include rows from tbl1 that have a matching row in tbl2 with boolfield=1. It joins the tables, and then filters out the rows that don't meet the condition.
Select *
From tbl1
LEFT JOIN tbl2 on tbl1.id=tbl2.id
WHERE tbl2.boolfield=1
In your specific case, the t1.boolfield specifies an additional selection condition, not a condition for matching records between the two tables, so the second example is more correct.
If you're speaking about the cases when a condition for matching records is put in the ON clause vs. in the WHERE clause, see this question.
Both versions return the same data.
Although this is true for an inner join, it is not true for outer joins.
Stylistically, there is a third possibility. In addition to your two, there is also:
select t1.field1, t2.field1
from (select t1.*
from table1 t1
where t1.boolfield = 1
) t1 inner join
table2 t2
on t1.id = t2.id
Which is preferable all depends on what you want to highlight, so you (or someone else) can later understand and modify the query. I often prefer the third version, because it emphasizes that the query is only using certain rows from the table -- the boolean condition is very close to where the table is specified.
In the other two cases, if you have a long query, it can be problematic to figure out what "t1" really means. I think this is why some people prefer to put the condition in the ON clause. Others prefer the WHERE clause.

SQL ANSI joins and the order of tables in it

The following query is automatically translated from the "old" syntax to ANSI syntax and gives an error:
select *
from ods_trf_pnb_stuf_lijst_adrsrt2 lst
join ods_stg_pnb_stuf_pers_adr pas
on (pas.soort_adres = lst.soort_adres)
right outer join ods_stg_pnb_stuf_pers_nat nat
on (prs.id = nat.prs_id) <<<prs.id invalid identifier
join ods_stg_pnb_stuf_adr adr
on (adr.id = pas.adr_id)
join ods_stg_pnb_stuf_np prs
on (prs.id = pas.prs_id)
I guess this is because table prs is referenced before it has been declared. Moving the prs join up in the query solves the problem:
select *
from ods_trf_pnb_stuf_lijst_adrsrt2 lst
join ods_stg_pnb_stuf_pers_adr pas
on (pas.soort_adres = lst.soort_adres)
join ods_stg_pnb_stuf_np prs <<< this first
on (prs.id = pas.prs_id)
right outer join ods_stg_pnb_stuf_pers_nat nat
on (prs.id = nat.prs_id) <<< now prs.id is known
join ods_stg_pnb_stuf_adr adr
on (adr.id = pas.adr_id)
where lst.persoonssoort = 'PERSOON'
and pas.einddatumrelatie is null
Is there a way to write this query so that the order is less restrictive, still using the ANSI syntax?
If the broken query was generated by a tool from the old non-ANSI syntax, the tools is generated broken code. However, using ANSI-style joins should yield the same result regardless of the order of tables in the from clause. That is
select *
from t1
join t2 on t2.id = t1.id
left join t3 on t3.id = t1.id
will give you the same results (albeit a different ordering of columns in the result set) as
select *
from t1
left join t3 on t3.id = t1.id
join t2 on t2.id = t1.id
Note that the from clause can't be reordered in such a way as to break the dependencies implied by the join criteria. However, you may also, restate/refactor the from clause so as to express the query in a different way that will yield the same result set. For instance, the above query is equivalent to
select *
from t3
right join t1 on t1.id = t3.id
join t2 on t2.id = t1.id
You simply cannot reference a table unless it has been in the join list earlier. That is normal and expected behavior. Why is this a problem?
A normal ("INNER") JOIN
SELECT ...
FROM a
JOIN b ON (a.x = b.y)
is equivalent to a SELECT with two tables and an appropiate WHERE clause
SELECT ...
FROM a, b
WHERE a.x = b.y
For left/right/outer joins, you are still handicapped by "the asymmetric" join syntax.
I think the original SQL code should be something looks like this,
select *
from ods_trf_pnb_stuf_lijst_adrsrt2 lst
, ods_stg_pnb_stuf_pers_adr pas
, ods_stg_pnb_stuf_pers_nat nat
, ods_stg_pnb_stuf_adr adr
, ods_stg_pnb_stuf_np prs
where
pas.soort_adres = lst.soort_adres
and prs.id(+) = nat.prs_id
and adr.id = pas.adr_id
and prs.id = pas.prs_id
and lst.persoonssoort = 'PERSOON'
and pas.einddatumrelatie is null
ods_stg_pnb_stuf_np prs is at the end of from clause which is valid in Oracle proprietary joins,
But when convert this to ANSI SQL syntax, table prs should be joined first before it was referenced. This is a common mistake that people made when convert Oracle proprietary joins to ANSI SQL syntax.
There are some other issues when convert Oracle proprietary joins to ANSI SQL syntax:
additional join condition was missing.
condition in where clause was broken after moving some conditions to join clause.
If your colleague need to rewrite Oracle proprietary joins to ANSI SQL syntax, demos(both in java and C#) listed in this article should be helpful.

SQL Server query performance - removing need for Hash Match (Inner Join)

I have the following query, which is doing very little and is an example of the kind of joins I am doing throughout the system.
select t1.PrimaryKeyId, t1.AdditionalColumnId
from TableOne t1
join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId
join TableThree t3 on t1.PrimaryKeyId = t3.ForeignKeyId
join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId
join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId
where
t1.StatusId = 1
and t5.TypeId = 68
There are indexes on all the join columns, however the performance is not great. Inspecting the query plan reveals a lot of Hash Match (Inner Joins) when really I want to see Nested Loop joins.
The number of records in each table is as follows:
select count(*) from TableOne
= 64393
select count(*) from TableTwo
= 87245
select count(*) from TableThree
= 97141
select count(*) from TableFour
= 116480
select count(*) from TableFive
= 62
What is the best way in which to improve the performance of this type of query?
First thoughts:
Change to EXISTS (changes equi-join to semi-join)
You need to have indexes on t1.StatusId, t5.TypeId and INCLUDE t1.AdditionalColumnID
I wouldn't worry about your join method yet...
Personally, I've never used a JOIN hint. They only work for the data, indexes and statistics you have at that point in time. As these change, your JOIN hint limits the optimiser
select t1.PrimaryKeyId, t1.AdditionalColumnId
from
TableOne t1
where
t1.Status = 1
AND EXISTS (SELECT *
FROM
TableThree t3
join TableFour t4 on t3.ForeignKeyId = t4.PrimaryKeyId
join TableFive t5 on t4.ForeignKeyId = t5.PrimaryKeyId
WHERE
t1.PrimaryKeyId = t3.ForeignKeyId
AND
t5.TypeId = 68)
AND EXISTS (SELECT *
FROM
TableTwo t2
WHERE
t1.ForeignKeyId = t2.PrimaryKeyId)
Index for tableOne.. one of
(Status, ForeignKeyId) INCLUDE (AdditionalColumnId)
(ForeignKeyId, Status) INCLUDE (AdditionalColumnId)
Index for tableFive... probably (typeID, PrimaryKeyId)
Edit: updated JOINS and EXISTS to match question fixes
SQL Server is pretty good at optimizing queries, but it's also conservative: it optimizes queries for the worst case. A loop join typically results in an index lookup and a bookmark lookup for for every row. Because loop joins cause dramatic degradation for large sets, SQL Server is hesitant to use them unless it's sure about the number of rows.
You can use the forceseek query hint to force an index lookup:
inner join TableTwo t2 with (FORCESEEK) on t1.ForeignKeyId = t2.PrimaryKeyId
Alternatively, you can force a loop join with the loop keyword:
inner LOOP join TableTwo t2 on t1.ForeignKeyId = t2.PrimaryKeyId
Query hints limit SQL Server's freedom, so it can no longer adapt to changed circumstances. It's best practice to avoid query hints unless there is a business need that cannot be met without them.

SQL style question: INNER JOIN in FROM clause or WHERE clause?

If you are going to join multiple tables in a SQL query, where do you think is a better place to put the join statement: in the FROM clause or the WHERE clause?
If you are going to do it in the FROM clause, how do you format it so that it is clear and readable? (I'm talking about indents, newlines, whitespace in general.)
Are there any advantages/disadvantages to each?
I tend to use the FROM clause, or rather the JOIN clause itself, indenting like this (and using aliases):
SELECT t1.field1, t2.field2, t3.field3
FROM table1 t1
INNER JOIN table2 t2
ON t1.id1 = t2.id1
INNER JOIN table3 t3
ON t1.id1 = t3.id3
This keeps the join condition close to where the join is made. I find it easier to understand this way then trying to look through the WHERE clause to figure out what exactly is joined how.
When making OUTER JOINs (ANSI-89 or ANSI-92), filtration location matters because criteria specified in the ON clause is applied before the JOIN is made. Criteria against an OUTER JOINed table provided in the WHERE clause is applied after the JOIN is made. This can produce very different result sets.
In comparison, it doesn't matter for INNER JOINs if the criteria is provided in the ON or WHERE clauses -- the result will be the same. That said, I strive to keep the WHERE clause clean -- anything related to JOINed tables will be in their respective ON clause. Saves hunting through the WHERE clause, which is why ANSI-92 syntax is more readable.
I prefer the FROM clause if for no other reason that it distinguishes between filtering results (from a Cartesian product) merely between foreign key relationships and between a logical restriction. For example:
SELECT * FROM Products P JOIN ProductPricing PP ON P.Id = PP.ProductId
WHERE PP.Price > 10
As opposed to
SELECT * FROM Products P, ProductPricing PP
WHERE P.Id = PP.ProductID AND Price > 10
I can look at the first one and instantly know that the only logical restriction I'm placing is the price, as opposed to the implicit machinery of joining tables together on the relationship key.
I almost always use the ANSI 92 joins because it makes it clear that these conditions are for JOINING.
Typically I write it this way
FROM
foo f
INNER JOIN bar b
ON f.id = b.id
sometimes I write it this way when it trivial
FROM
foo f
INNER JOIN bar b ON f.id = b.id
INNER JOIN baz b2 ON b.id = b2.id
When its not trivial I do the first way
e.g.
FROM
foo f
INNER JOIN bar b
ON f.id = b.id
and b.type = 1
or
FROM
foo f
INNER JOIN (
SELECT max(date) date, id
FROM foo
GROUP BY
id) lastF
ON f.id = lastF.id
and f.date = lastF.Date
Or really the weird (not sure if I got the parens correctly but its supposed to be an LEFT join to table bar but bar needs an inner join to baz)
FROM
foo f
LEFT JOIN (bar b
INNER JOIN baz b2
ON b.id = b2.id
)ON f.id = b.id
You should put joins in Join clauses which means the From clause. A different question could be had about where to put filtering statements.
With respect to indenting, there are many styles. My preference is to indent related joins and keep main clauses like Select, From, Where, Group By, Having and Order By indented at the same level. In addition, I put each of these main attributes and the first line of an On clause on its own line.
Select ..
From Table1
Join Table2
On Table2.FK = Table1.PK
And Table2.OtherCol = '12345'
And Table2.OtherCol2 = 9876
Left Join (Table3
Join Table4
On Table4.FK = Table3.PK)
On Table3.FK = Table2.PK
Where ...
Group By ...
Having ...
Order By ...
Use the FROM clause to be compliant with ANSI-92 standards.
This:
select *
from a
inner join b
on a.id = b.id
where a.SomeColumn = 'x'
Not this:
select *
from a, b
where a.id = b.id
and a.SomeColumn = 'x'
I definitely always do my JOINS (of whatever type) in my FROM clause.
The way I indent them is this:
SELECT fields
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.t1_id
INNER JOIN table3 t3 ON t1.id = t3.t1_id
AND
t2.id = t3.t2_id
In fact, I'll generally go a step farther and move as much of my constraining logic from the WHERE clause to the FROM clause, because this (at least in MS SQL) front-loads the constraint, meaning that it reduces the size of the recordset sooner in the query construction (I've seen documentation that contradicts this, but my execution plans are invariably more efficient when I do it this way).
For example, if I wanted to only select things in the above query where t3.id = 3, you could but that in the WHERE clause, or you could do it this way:
SELECT fields
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.t1_id
INNER JOIN table3 t3 ON t1.id = t3.t1_id
AND
t2.id = t3.t2_id
AND
t3.id = 3
I personally find queries laid out in this way to be very readable and maintainable, but this is certainly a matter of personal preference, so YMMV.
Regardless, I hope this helps.
ANSI joins. I omit any optional keywords from the SQL as they only add noise to the equation. There's no such thing as a left inner join, is there? And by default, a simple join is an inner join, so there's no particular point to saying 'inner join'.
Then I column align things as much as possible.
The point being that a large complex SQL query can be very difficult to comprehend, so the more order that is imposed on it to make it more readable, the better. Any body looking at the query to fix, modify or tune it, needs to be able to answer a few things off right off the bat:
what tables/views are involved in the query?
what are the criteria for each join? What's the cardinality of each join?
what/how many columns are returned by the query
I like to write my queries so they look something like this:
select PatientID = rpt.ipatientid ,
EventDate = d.dEvent ,
Side = d.cSide ,
OutsideHistoryDate = convert(nchar, d.devent,112) ,
Outcome = p.cOvrClass ,
ProcedureType = cat.ctype ,
ProcedureCategoryMajor = cat.cmajor ,
ProcedureCategoryMinor = cat.cminor
from dbo.procrpt rpt
join dbo.procd d on d.iprocrptid = rpt.iprocrptid
join dbo.proclu lu on lu.iprocluid = d.iprocluid
join dbo.pathlgy p on p.iProcID = d.iprocid
left join dbo.proccat cat on cat.iproccatid = lu.iproccatid
where procrpt.ipatientid = #iPatientID