Difference between "and" and "where" in joins - sql

Whats the difference between
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
and table2.Id IN (2728)
and
SELECT DISTINCT field1
FROM table1 cd
JOIN table2
ON cd.Company = table2.Name
where table2.Id IN (2728)
both return the same result and both have the same explain output

Firstly there is a semantic difference. When you have a join, you are saying that the relationship between the two tables is defined by that condition. So in your first example you are saying that the tables are related by cd.Company = table2.Name AND table2.Id IN (2728). When you use the WHERE clause, you are saying that the relationship is defined by cd.Company = table2.Name and that you only want the rows where the condition table2.Id IN (2728) applies. Even though these give the same answer, it means very different things to a programmer reading your code.
In this case, the WHERE clause is almost certainly what you mean so you should use it.
Secondly there is actually difference in the result in the case that you use a LEFT JOIN instead of an INNER JOIN. If you include the second condition as part of the join, you will still get a result row if the condition fails - you will get values from the left table and nulls for the right table. If you include the condition as part of the WHERE clause and that condition fails, you won't get the row at all.
Here is an example to demonstrate this.
Query 1 (WHERE):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
WHERE table2.Id IN (2728);
Result:
field1
200
Query 2 (AND):
SELECT DISTINCT field1
FROM table1 cd
LEFT JOIN table2
ON cd.Company = table2.Name
AND table2.Id IN (2728);
Result:
field1
100
200
Test data used:
CREATE TABLE table1 (Company NVARCHAR(100) NOT NULL, Field1 INT NOT NULL);
INSERT INTO table1 (Company, Field1) VALUES
('FooSoft', 100),
('BarSoft', 200);
CREATE TABLE table2 (Id INT NOT NULL, Name NVARCHAR(100) NOT NULL);
INSERT INTO table2 (Id, Name) VALUES
(2727, 'FooSoft'),
(2728, 'BarSoft');

SQL comes from relational algebra.
One way to look at the difference is that JOINs are operations on sets that can produce more records or less records in the result than you had in the original tables. On the other side WHERE will always restrict the number of results.
The rest of the text is extra explanation.
For overview of join types see article again.
When I said that the where condition will always restrict the results, you have to take into account that when we are talking about queries on two (or more) tables you have to somehow pair records from these tables even if there is no JOIN keyword.
So in SQL if the tables are simply separated by a comma, you are actually using a CROSS JOIN (cartesian product) which returns every row from one table for each row in the other.
And since this is a maximum number of combinations of rows from two tables then the results of any WHERE on cross joined tables can be expressed as a JOIN operation.
But hold, there are exceptions to this maximum when you introduce LEFT, RIGHT and FULL OUTER joins.
LEFT JOIN will join records from the left table on a given criteria with records from the right table, BUT if the join criteria, looking at a row from the left table is not satisfied for any records in the right table the LEFT JOIN will still return a record from the left table and in the columns that would come from the right table it will return NULLs (RIGHT JOIN works similarly but from the other side, FULL OUTER works like both at the same time).
Since the default cross join does NOT return those records you can not express these join criteria with WHERE condition and you are forced to use JOIN syntax (oracle was an exception to this with an extension to SQL standard and to = operator, but this was not accepted by other vendors nor the standard).
Also, joins usually, but not always, coincide with existing referential integrity and suggest relationships between entities, but I would not put as much weight into that since the where conditions can do the same (except in the before mentioned case) and to a good RDBMS it will not make a difference where you specify your criteria.

The join is used to reflect the entity relations
the where clause filters down results.
So the join clauses are 'static' (unless the entity relations change), while the where clauses are use-case specific.

There is no difference. "ON" is like a synonym for "WHERE", so t he second kind of reads like:
JOIN table2 WHERE cd.Company = table2.Name AND table2.Id IN (2728)

There is no difference when the query optimisation engine breaks it down to its relevant query operators.

Related

Joining multiple tables with single join clause (sqlite)

So I'm learning SQL (sqlite flavour) and looking through the sqlite JOIN-clause documentation, I figure that these two statements are valid:
SELECT *
FROM table1
JOIN (table2, table3) USING (id);
SELECT *
FROM table1
JOIN table2 USING (id)
JOIN table3 USING (id)
(or even, but that's beside the point:
SELECT *
FROM table1
JOIN (table 2 JOIN table3 USING id) USING id
)
Now I've seen the second one (chained join) a lot in SO questions on JOIN clauses, but rarely the first (grouped table-query). Both querys execute in SQLiteStudio for the non-simplified case.
A minimal example is provided here based on this code
CREATE TABLE table1 (
id INTEGER PRIMARY KEY,
field1 TEXT
)
WITHOUT ROWID;
CREATE TABLE table2 (
id INTEGER PRIMARY KEY,
field2 TEXT
)
WITHOUT ROWID;
CREATE TABLE table3 (
id INTEGER PRIMARY KEY,
field3 TEXT
)
WITHOUT ROWID;
INSERT INTO table1 (field1, id)
VALUES ('FOO0', 0),
('FOO1', 1),
('FOO2', 2),
('FOO3', 3);
INSERT INTO table2 (field2, id)
VALUES ('BAR0', 0),
('BAR2', 1),
('BAR3', 3);
INSERT INTO table3 (field3, id)
VALUES ('PIP0', 0),
('PIP1', 1),
('PIP2', 2);
SELECT *
FROM table1
JOIN (table2, table3) USING (id);
SELECT *
FROM table1
JOIN table2 USING (id)
JOIN table3 USING (id);
Could someone explain why one would use one over the other and if they are not equivalent for certain input data, provide an example? The first certainly looks cleaner (at least less redundant) to me.
INNER JOIN ON vs WHERE clause has been suggested as a possible duplicate. While it touches on the use of , as a join operator, I feel the questions and especially the answers are more focussed on the readability aspect and use of WHERE vs JOIN. My question is more about the general validity and possible differences in outcome (given the necessary input to induce the difference).
SQLite does not enforce a proper join syntax. It sees the join operator ([INNER] JOIN, LEFT [OUTER] JOIN, etc., even the comma of the outdated 1980s join syntax) separate from the condition (ON, USING). That is not good, because it makes joins more prone to errors. The SQLite docs are hence a very bad reference for learning joins. (And SQLite itself a bad system for learning them, because the DBMS doesn't detect standard SQL join violations.)
Stick to the syntax defined by the SQL standard (and don't ever use comma-separated joins):
FROM table [alias]
((([INNER] | [(LEFT|FULL) [OUTER]]) JOIN table [alias] (ON conditions | USING ( columns ))) | (CROSS JOIN table [alias]))
((([INNER] | [(LEFT|FULL) [OUTER]]) JOIN table [alias] (ON conditions | USING ( columns ))) | (CROSS JOIN table [alias]))
...
(Hope, I've got this right :-) And I also hope this is readable enough :-| I've omitted NATURAL JOIN and RIGHT [OUTER] JOIN here, because I don't recommend using them at all.)
For table you can place some table name or view or a subquery (the latter including parentheses, e.g. (select * from mytable)). Columns in USING have to be surrounded by parentheses (e.g. USING (a, b, c)). (You can of couse use parentheses around ON conditions as well, if you find this more readable.)
In your case, a properly written query would be:
SELECT *
FROM table1
JOIN table2 USING (id)
JOIN table3 USING (id)
or
SELECT *
FROM table1 t1
JOIN table2 t2 ON t2.id = t1.id
JOIN table3 t3 ON t3.id = t1.id
for instance. The example suggests three 1:1 related tables, though. In real life these are extremely rare and a more typical example would be
SELECT *
FROM table1 t1
JOIN table2 t2 ON t2.t1_id = t1.id
JOIN table3 t3 ON t3.t2_id = t2.id
After fixing syntax, these are not the same for all tables, read the syntax & definitions of the join operators in the manual. Comma is cross join with lower precedence than join keyword joins. Different DBMS's SQLs have syntax variations. Read the manual. Some allow naked join for cross join.
using returns only one column for each specified column name & natural is using for all common columns; but other joins are based on cross join & return a column for every input column. So since here tables 2 & 3 have id columns the comma returns a table with 2 id columns. Then using (id) doesn't make sense since one operand has 2 id columns.
If only tables 1 & 3 have an id column, clearly the 2nd query can't join 1 & 2 using id.
There are always many ways to express things. In particular SQL DBMSs execute many different expressions the same way. Research re relational query implementation/optimization in general, in SQL & in your DBMS manual. Generally no simple query variations like these make a difference in execution for the simplest query engine. (We see that in SQLite cross join "is handled differently by the query optimizer".)
First learn to write straightforward queries & learn what the operators do & what their syntax & restrictions are.

SQL Server / MS Access help understanding WHERE CLAUSE in terms of NULLS and LEFT JOIN

I have the following example generated by MS Access for generating results base on table1 without matching table2 on the IP Address columns.
SELECT
Table1.ID, Table1.IP_Address, Table1.Field1
FROM
Table1
LEFT JOIN
Table2 ON Table1.[IP_Address] = Table2.[IP Address]
WHERE
(((Table2.[IP Address]) IS NULL));
While trying to analyze "WHERE (((Table2.[IP Address]) Is Null))" I do not understand how this makes sense, as I interpret it as only return results that are NULL for table2#IP Address. My understanding of WHERE clause is like a filter mechanism for your query and NULL is blank. Can someone help me understand this counter-intuitive statement?
First, a more intuitive way to write the query would use NOT EXISTS:
SELECT Table1.ID, Table1.IP_Address, Table1.Field1
FROM Table1
WHERE NOT EXISTS (SELECT 1
FROM Table2
WHERE Table1.[IP_Address] = Table2.[IP Address]
);
That said, the LEFT JOIN method is perfectly reasonable -- and sensible too.
LEFT JOIN keeps all the rows in the first table (Table1) and matching rows in the second. If there is no match, then the Table2 columns need to be filled with a value -- and for the non-matches, that value is NULL.
The WHERE clause is keeping only these NULL values. Voila! It keeps the rows in Table1 that have no matching value in Table2.
You already mentioned the answer:
generating results base on table1 without matching table2
You use a LEFT JOIN, so you get all the rows from the LEFT table and matching and empty (null) as unmatced rows from the RIGHT table.
The unmatched rows from the RIGHT table will have Table2.[IP Address] equal to Null (since they are unmatching).
So the condition:
WHERE Table2.[IP Address] Is Null
will do exactly what you need:
fetch only these rows from the LEFT table that do not have a match
on the RIGHT table
.

How does JOIN work exactly in SQL

I know that joins work by combining two or more tables by their attributes, so if you have two tables that both have three columns and both have column INDEX, if you use table1 JOIN table2 you will get a new table with 5 columns, but what if you do not have a column that is shared by both table1 and table2? Can you still use JOIN or do you have to use TIMES?
Join is not a method for combining tables. It is a method to select records (and selected fields) from 2 or more tables where every table in the query must carry a field that can be matched to a field in another table in the query. The matched fields need not have the same name, but must carry the same type of data. Lacking this would be like trying to create meaning from joining a list of license plates of cars in NYC, with height data from lumberjacks in Washington state -- not meaningful.
Ex:)
Select h.name, h.home_address, h.home_phone, w.work_address,
w.department
from home h, work w
where h.employee_id = w.emp_id
As long as both columns: employee_id and emp_id carry the same information this query will work
In Microsoft Access, to get five rows from a three column table joined to a two column table, you'd use:
SELECT Table1.*, Table2.* FROM Table1 INNER JOIN Table2 ON Table1.Field1 = Table2.Field1;
You can query whatever you want, and join whatever you want, though.
If your one table is a list of people, and your other is a list of cars, and you want to see what people have names that are also models of cars, you can do:
SELECT Table1.Name, Table1.Age, Table2.Make, Table2.Year
FROM Table1 INNER JOIN Table2 ON Table1.Name = Table2.Model;
Only when Name is the same as Model will it show a record.
This is the same idea for joining tables in any relational DBMS I've used.
You are right you can join two tables even if they do not have shared column.
Join uses primary to prevent mistakes on inserting or deleting when user trying to insert record that does not has a parent one or some thing like this.
join methods has many types you can view them here:
http://dev.mysql.com/doc/refman/5.7/en/join.html
LEFT JOIN: select all records from first table, then selecting all records from second table that fulfilling the condition after ON clause.
you can't join the tables if they do not share a common column. If you can find a 3rd table that has common columns with table1 and table2 you can get them to join that way. so join table2 and tabl3 on a common column and than join table3 back to table1 on a common column.

sql - what's the faster/better way to refer to columns in a where clause with inner joins?

Say I've got a query like this:
select table1.id, table1.name
from table1
inner join table2 on table1.id = table2.id
where table1.name = "parent" and table2.status = 1
Is it true that, since there's an inner join, I can refer the table2's status column even from table1? Like this:
select table1.id, table1.name
from table1
inner join table2 on table1.id = table2.id
where table1.name = "parent" and table1.status = 1
And if yes, what's the best of the two ways?
If I am not mistaken, you are asking that in an inner join, two fields of the same name, data type and length will be one field in the particular query. Technically that is not the case. Regardless of anything, Table1.Status will refer to Table1 and Table2.Status will refer to Table2's condition/value.
The two queries above CAN product different results from each other.
A good rule on this is that you stick your conditions on the base table, or Table1, in this case. If a field is exclusive to another table, that's when you'll use that Table's field.
No, that's not true. By Inner join what you are doing is say if you have table1 with m rows and table two with n rows then the third SET that will be produced by joining the two tables will have m*n rows based on match condition that you have mentioned in where clause. It's not m+n rows or infact columns of the two tables are not getting merged at database level. status column will remain in the table it has been defined.
Hope that helps!!!
You can see this is not the case if you do
CREATE TABLE table1 (id INT, name VARCHAR);
CREATE TABLE table2 (id INT, status INT);
Now if you run your second query you will get an error because you refer to t1.status, and the status column does not existing in table t1.
If there was a status field in both tables the query would run, but likely would not give the results you want e.g. assume status in table1 was always 1, and in table2 was always 0. Now your first query could never return rows, but your second one certainly could return rows.

SQL join result errors

I'm trying to run this join and I'm not receiving the correct values.
My first query return like 25,000 record
SELECT count(*) from table1 as DSO,
table2 as EAR
WHERE
(UCASE(TRIM(EAR.value)) = UCASE(TRIM(DSO.value))
AND
UCASE(TRIM(EAR.value1) = UCASE(TRIM(DSO.value1))
my second Query return like 3,000,000
SELECT count(*) from table1 as DSO
left join table2 as EAR,
ON
(UCASE(TRIM(EAR.value)) = UCASE(TRIM(DSO.value))
AND
UCASE(TRIM(EAR.value1) = UCASE(TRIM(DSO.value1))
The total of records of the table 1 are like 45,000, thats what I Should recieve.
First query is an INNER JOIN and second one is a LEFT JOIN. You should expect quite different results. Also, look at the way db2400 treats NULLs with the UCASE and TRIM functions. My guess is that your left join is making some matches that you don't want.
The INNER JOIN in the first query is going to exclude any records from table1 that don't have a match in table2. That pretty quickly explains the lower count.
Either join will happily create more than one row for each record in table1 if it finds multiple matches in table2. The difference is that the LEFT JOIN will ALSO create one row for each record in table1 that doesn't have a match in table2. It sounds like you expect there to be a 1:1 match between the two tables, but that is not what you are getting.