Factor where clauses into subqueries - sql

I'm wondering if its always possible in SQL to factor a where condition through a join to a subquery. For instance, if I have
select ... from a join b on ... where p and q
and p pertains only to a, q to b, then can I always rewrite as?
select ... from (select ... from a where p) as a join (select ... from b where q) as b on ...
Thanks!
[Notes: 1) I'm using postgres in case this affects the answer. 2) Readability is not an important consideration, as these are automatically generated queries. Edit: 3) I'm not only interested in inner join but other joins as well.]

In general the query 1:
SELECT ...
FROM TableA
JOIN TableB ON <SomeForeignKey>
JOIN TableC ON <SomeForeignKey>
WHERE <SomeConditionOnTableA> AND
<SomeConditionOnTableB> AND
<SomeConditionOnTableC>
... is equivalent to the query 2:
SELECT ...
FROM TableA
JOIN TableB ON <SomeForeignKey> AND <SomeConditionOnTableB>
JOIN TableC ON <SomeForeignKey> AND <SomeConditionOnTableC>
WHERE <SomeConditionOnTableA>
But the same is not true if instead of (INNER) JOINs you use OUTER JOINs. With OUTER JOINs the equivalency holds for very simple conditions that match NOT NULL column values, like:
name='value'
name LIKE '%value%'
number < const
field IN (...)
Notice that these are all conditions that make the OUTER JOINs moot anyway, as they are filtering out rows that have NULL values in the envolved columns... so they would filter out also the rows added by the OUTER JOIN not retrieving anything from the joined table.
But the equivalency breaks if you use OUTER JOINs and start comparing column values with NULLs or comparing expressions that may envolve NULLs.
For example, taking this query (formatted as query 1):
SELECT ...
FROM TableA a
LEFT JOIN TableB b ON <SomeForeignKey>
LEFT JOIN TableC c ON <SomeForeignKey>
WHERE a.somefield = 'whatever'
AND b.name IS NOT NULL
AND c.somenumber >100
In this case the filter is applied after having resolved the OUTER JOIN, and it eliminates both the rows that exist in TableB and have a NULL name, but also removes the rows that where added by the OUTER JOIN not finding a matching row in TableB. This is not equivalent to the query 2 format:
SELECT ...
FROM TableA a
LEFT JOIN TableB b ON <SomeForeignKey> AND b.name IS NOT NULL
LEFT JOIN TableC c ON <SomeForeignKey> AND c.somenumber >100
WHERE a.somefield = 'whatever'
In this case the filter is applied to TableB before resolving the OUTER JOIN. TableB rows that have a NULL name are eliminated by the filter, but reintroduced by the LEFT JOIN. So this query might contain rows that the former does not.

I would say yes, I can't think of a situation where it is not possible. WHERE in it self can be replaced with a join:
select ... from A where x=10
<=>
select ... from A join ( values (10) ) B (x) on A.x = B.x
Perhaps off topic, but for transformations in general Vadim Tropashko (http://arxiv.org/abs/cs/0501053) shows that it is possible to reduce the set of classic relational algebra operators to two binary operations: natural join and generalized union

Related

Allays return value from left table in join

I have 2 tables
A and B
A has cols(AKey, val1, val2)
B has Cols(BKey,Akey, ValX, valY)
i have the following query
select a.Val1,a.Val2,b.ValX
from A
Left Join B on a.AKey = b.Akey
where a.Akey ={someValue}
and ((b.valY ={aDifferentVal}) or (b.valY is NULL))
The situation is that i always want to return the values from table A.
and this works when {aDifferentVal} exists in the the join, it also works when there are no values in table B for the Join, However when there are values in table be for the Join but none of these are {aDifferentVal} then the query return nothing, and i still want the values from table A.
How can i achieve this?
Just move the condition on the left joined table from the where clause to the on clause of the join - otherwise they become mandatory, and rows where they are not fullfilled are filered out (here this removes rows that match but whose valy does not match {adifferentval}):
select a.val1,a.val2,b.valx
from a
left join b
on b.akey = a.akey
and b.valy = {adifferentval}
where a.akey = {somevalue}
Move the conditions on the second table to the on clause:
select a.Val1,a.Val2,b.ValX
from A Left Join
B
on a.AKey = b.Akey and (b.valY ={aDifferentVal})
where a.Akey = {someValue}
Filtering in the where clause (sort of) turns the outer join into an inner join. Your version is slightly better, because it is checking for NULL. However, the rows that match are:
The A values that have no match in B at all.
The A values that match the condition you specify.
What gets filtered out are A values that match a row in B but none of the matches have the condition you specify.

Exclude using joins without subquery

how to converse following code to get the same results using join (without using subquery)
select a_key from table_a a
inner join table_b b --in my code I've 5 joins like that
on a.a_key=b.a_key
where a_key not in
(select a_key from table_c --and conditions within this brackets also
where var_a beteween table_c.col1 and table_c.col2
or var_b beteween table_c.col1 and table_c.col2
)
The following is essentially the same logic:
select a_key
from table_a a inner join
table_b b
on a.a_key = b.a_key left join
table_c c
on (var_a between table_c.col1 and table_c.col2 or
var_b between table_c.col1 and table_c.col2
) and
a.a_key = c.a_key
where c.a_key is null;
You should prefix your columns with table aliases. The column a_key is ambiguous in your original, as are the column var_a and var_b.
These are slightly different if any matching table_c.a_key values are NULL. In that case, the join version probably behaves more like you would expect.

What means "table A left outer join table B ON TRUE"?

I know conditions are used in table joining. But I met a specific situation and the SQL codes writes like "Table A join table B ON TRUE"
What will happen based on the "ON TRUE" condition? Is that just a total cross join without any condition selection?
Actually, the original expression is like:
Table A LEFT outer join table B on TRUE
Let's say A has m rows and B has n rows. Is there any conflict between "left outer join" and "on true"? Because it seems "on true" results a cross join.
From what I guess, the result will be m*n rows. So, it has no need to write "left outer join", just a "join" will give the same output, right?
Yes. That's the same thing as a CROSS JOIN.
In MySQL, we can omit the [optional] CROSS keyword. We can also omit the ON clause.
The condition in the ON clause is evaluated as a boolean, so we could also jave written something like ON 1=1.
UPDATE:
(The question was edited, to add another question about a LEFT [OUTER] JOIN b which is different than the original construct: a JOIN b)
The "LEFT [OUTER] JOIN" is slightly different, in that rows from the table on the left side will be returned even when there are no matching rows found in the table on the right side.
As noted, a CROSS JOIN between tables a (containing m rows) and table b containing n rows, absent any other predicates, will produce a resultset of m x n rows.
The LEFT [OUTER] JOIN will produce a different resultset in the special case where table b contains 0 rows.
CREATE TABLE a (i INT);
CREATE TABLE b (i INT);
INSERT INTO a VALUES (1),(2),(3);
SELECT a.i, b.i FROM a LEFT JOIN b ON TRUE ;
Note that the LEFT JOIN will returns rows from table a (a total of m rows) even when table b contains 0 rows.
A cross join produces a cartesian product between the two tables, returning all possible combinations of all rows. It has no on clause because you're just joining everything to everything.
Cross join does not combine the rows, if you have 100 rows in each table with 1 to 1 match, you get 10.000 results, Innerjoin will only return 100 rows in the same situation.
These 2 examples will return the same result:
Cross join
select * from table1 cross join table2 where table1.id = table2.fk_id
Inner join
select * from table1 join table2 on table1.id = table2.fk_id
Use the last method
The join syntax's general form:
SELECT *
FROM table_a
JOIN table_b ON condition
The condition is used to tell the database how to match rows from table_a to table_b, and would usually look like table_a.some_id = table_b.some_id.
If you just specify true, you will match every row from table_a with every row of table_b, so if table_a contains n rows and table_b contains m rows the result would have m*n rows.
Most(?) modern databases have a cleaner syntax for this, though:
SELECT *
FROM table_a
CROSS JOIN table_b
The difference between the pure cross join and left join (where the condition is forced to be always true, as when using ON TRUE) is that the result set for the left join will also have rows where the left table's rows appear next to a bunch of NULLs where the right table's columns would have been.

Need help with a sql query that has an inner and outer join

I really need help getting this query right. I can't share actual table and column names, but will try my best to layout the problem simply.
Assume the following tables. The tables and keys CANNOT be changed. Period. I don't care if you think it's a bad design, this question isn't a design question, it's on SQL syntax.
Table A - Primary key named id1
Table B - Contains two foreign keys, TableA.id1 and Foo.id2(ignore Foo, it doesn't matter for this)
Table C - Contains two foreign keys, TableA.id1 and Foo.id2, additional interesting
columns.
Constraints:
The SQL gets a set of id1s passed in as an argument.
It must return a list of Table C rows.
It must only return Table C rows where a Table B row exists with a matching TableA.id1 and Foo.id2 - There ARE rows in Table C that don't match Table B
A row MUST be returned for every id1 passed in, even if no Table C row exists.
At first I tried a Left Outer Join from Table A to Table B then an Inner Join to Table C. That violates the 4th rule above, as the Inner Join drops out those rows.
Next I tried two Left Outer joins. This is closer, but has the side effect of including rows that match the Table A join to Table B, but don't have a corresponding Table C entry, which isn't what I want.
So, here's what I came up with.
SELECT
a.id1,
c.*
FROM
TableB b
INNER JOIN
TableC c USING (id1,id2)
RIGHT OUTER JOIN
TableA a USING (id1)
WHERE
a.id1 in (x,y,z)
I'm a bit wary of a Right Outer Join, as the documentation I've read says it can be replaced with a Left Outer, but it doesn't appear so for this case. It also seems a bit rare, which is making other devs nervous, so I'm being cautious.
So, three questions in one.
Is this correct?
Did I use the Right Outer Join correctly?
Is there a cleaner way to achieve the same thing?
EDIT: DB is MySQL
You can rewrite it as a LEFT OUTER JOIN by using parentheses. In pseudo-SQL change this:
SELECT ...
FROM b
INNER JOIN c ON ...
RIGHT OUTER JOIN a ON ...
to this:
SELECT ...
FROM a
LEFT OUTER JOIN (
b INNER JOIN c ON ...
) ON ...
You can use an EXISTS clause, which sometimes works better
SELECT
a.id1,
c.*
FROM TableA a
LEFT JOIN TableC c
ON c.id1 = a.id1 AND EXISTS (
select *
from TableB b
where b.id1=c.id1 and b.id2=c.id2)
WHERE
a.id1 in (x,y,z)
As you have written it, it works because ANSI JOINs are always processed top to bottom. Since you need to test B against C before joining to A, it is about the only way to write it without introducing a subquery [(B x C) RIGHT JOIN A]. However, a bad query plan could perform all records in B and C (B x C) before right joining to A.
The EXISTS method efficiently uses the filter on A, then LEFT JOINs to C and for each C found, validates that it also exists in B (or discards).
Q's
Yes your query is correct
Yes
EXISTS should work better
Yeah, you need to start with TableA and then add tables B and C using joins. The only reason you even need TableA is to make sure you have a row for each parameter.
Select a.id1,c.*
From
TableA a
Left Join TableB b on a.id1=b.id1
Left Join TableC c on b.id1=c.id1 and b.id2=c.id2
Where a.id1 in (x,y,z)
You need to do OUTER joins all the way across, or rows that are missing in B will also cause data from A to be filtered out of the result set. By joining C to B (instead of directly to A) you are using B to filter. You could do it with a complicated EXISTS clause, but this is cleaner.

How can I implement SQL INTERSECT and MINUS operations in MS Access

I have researched and haven't found a way to run INTERSECT and MINUS operations in MS Access. Does any way exist
INTERSECT is an inner join. MINUS is an outer join, where you choose only the records that don't exist in the other table.
INTERSECT
select distinct
a.*
from
a
inner join b on a.id = b.id
MINUS
select distinct
a.*
from
a
left outer join b on a.id = b.id
where
b.id is null
If you edit your original question and post some sample data then an example can be given.
EDIT: Forgot to add in the distinct to the queries.
INTERSECT is NOT an INNER JOIN. They're different. An INNER JOIN will give you duplicate rows in cases where INTERSECT WILL not. You can get equivalent results by:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
on a.PK = b.PK
Note that PK must be the primary key column or columns. If there is no PK on the table (BAD!), you must write it like so:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
AND a.Col3 = b.Col3 ...
With MINUS, you can do the same thing, but with a LEFT JOIN, and a WHERE condition checking for null on one of table b's non-nullable columns (preferably the primary key).
SELECT DISTINCT a.*
FROM a
LEFT JOIN b
on a.PK = b.PK
WHERE b.PK IS NULL
That should do it.
They're done through JOINs. The old fashioned way :)
For INTERSECT, you can use an INNER JOIN. Pretty straightforward. Just need to use a GROUP BY or DISTINCT if you have don't have a pure one-to-one relationship going on. Otherwise, as others had mentioned, you can get more results than you'd expect.
For MINUS, you can use a LEFT JOIN and use the WHERE to limit it so you're only getting back rows from your main table that don't have a match with the LEFT JOINed table.
Easy peasy.
Unfortunately MINUS is not supported in MS Access - one workaround would be to create three queries, one with the full dataset, one that pulls the rows you want to filter out, and a third that left joins the two tables and only pulls records that only exist in your full dataset.
Same thing goes for INTERSECT, except you would be doing it via an inner join and only returning records that exist in both.
No MINUS in Access, but you can use a subquery.
SELECT DISTINCT a.*
FROM a
WHERE a.PK NOT IN (SELECT DISTINCT b.pk FROM b)
I believe this one does the MINUS
SELECT DISTINCT
a.CustomerID,
b.CustomerID
FROM
tblCustomers a
LEFT JOIN
[Copy Of tblCustomers] b
ON
a.CustomerID = b.CustomerID
WHERE
b.CustomerID IS NULL