When to cross or join two tables? - sql

Example 1:
SELECT name
FROM Customer, Order
WHERE Customer.id = Order.cid
Example 2:
SELECT name
FROM Customer JOIN Order
ON Customer.id = Order.cid
What is the difference between these two queries? When should I cross two tables vs JOIN?

Both will give you identical result. So there is no real situation to use one over another.
The comma separated join, is an ANSI 89 standard join, INNER JOIN is the newer ANSI 92 standard join.
However comma separated join syntax is depreciated we always prefer to use INNER JOIN syntax. When you want to join more than one table it will be difficult to follow the join conditions in Where clause where as INNER JOIN syntax is more readable

CROSS JOIN operation is a Cartesian product. Result of CROSS JOIN operation
between set A and set B is the superset that contains all values.
Then with operator WHERE you are filtering this result set.
INNER JOIN operation will try to find rows from both tables that correspond predicate
after keyword ON. That rows will go to the result set.
Practically, SQL engine can choose its own physical implementation CROSS JOIN operator.
SQL engine doesn't have to get huge result set and then filter it. SQL engine behaviour will
be similar when using INNER JOIN operation.

Related

SQL: JOIN vs LEFT OUTER JOIN?

I have multiple SQL queries that look similar where one uses JOIN and another LEFT OUTER JOIN. I played around with SQL and found that it the same results are returned. The codebase uses JOIN and LEFT OUTER JOIN interchangeably. While LEFT JOIN seems to be interchangeable with LEFT OUTER JOIN, I cannot I cannot seem to find any information about only JOIN. Is this good practice?
Ex Query1 using JOIN
SQL
SELECT
id,
name
FROM
u_users customers
JOIN
t_orders orders
ON orders.status=='PAYMENT PENDING'
Ex. Query2 using LEFT OUTER JOIN
SQL
SELECT
id,
name
FROM
u_users customers
LEFT OUTER JOIN
t_orders orders
ON orders.status=='PAYMENT PENDING'
As previously noted above:
JOIN is synonym of INNER JOIN. It's definitively different from all
types of OUTER JOIN
So the question is "When should I use an outer join?"
Here's a good article, with several great diagrams:
https://www.sqlshack.com/sql-outer-join-overview-and-examples/
The short answer your your question is:
Prefer JOIN (aka "INNER JOIN") to link two related tables. In practice, you'll use INNER JOIN most of the time.
INNER JOIN is the intersection of the two tables. It's represented by the "green" section in the middle of the Venn diagram above.
Use an "Outer Join" when you want the left, right or both outer regions.
In your example, the result set happens to be the same: the two expressions happen to be equivalent.
ALSO: be sure to familiarize yourself with "Show Plan" (or equivalent) for your RDBMS: https://www.sqlshack.com/execution-plans-in-sql-server/
'Hope that helps...
First the theory:
A join is a subset of the left join (all other things equal). Under some circumstances they are identical
The difference is that the left join will include all the tuples in the left hand side relation (even if they don't match the join predicate), while the join will only include the tuples of the left hand side that match the predicate.
For instance assume we have to relations R and S.
Say we have to do R JOIN S (and R LEFT JOIN S) on some predicate p
J = R JOIN S on (p)
Now, identify the tuples of R that are not in J.
Finally, add those tuples to J (padding any attribute in J not in R with null)
This result is the left join:
R LEFT JOIN S (p)
So when all the tuples of the left hand side of the relation are in the JOIN, this result will be identical to the Left Join.
back to you problem:
Your JOIN is very likely to include all the tuples from Users. So the query is the same if you use JOIN or LEFT JOIN.
The two are exactly equivalent, because the WHERE clause turns the LEFT JOIN into an INNER JOIN.
When filtering on all but the first table in a LEFT JOIN, the condition should usually be in the ON clause. Presumably, you also have a valid join condition, connecting the two tables:
SELEC id, name
FROM u_users u LEFT JOIN
t_orders o
ON o.user_id = u.user_id AND o.status = 'PAYMENT PENDING';
This version differs from the INNER JOIN version, because this version returns all users even those with no pending payments.
Both are the same, there is no difference here.
You need to use the ON clause when using Join. It can match any data between two tables when you don't use the ON clause.
This can cause performance issue as well as map unwanted data.
If you want to see the differences you can use "execution plans".
for example, I used the Microsoft AdventureWorks database for the example.
LEFT OUTER JOIN :
LEFT JOIN :
If you use the ON clause as you wrote, there is a possibility of looping.
Example "execution plans" is below.
You can access the correct mapping and data using the ON clause complement.
select
id,
name
from
u_users customers
left outer join
t_orders orders on customers.id = orders.userid
where orders.status=='payment pending'

What is the difference between CROSS JOIN and multiple tables in one FROM? [duplicate]

This question already has answers here:
What is the difference between using a cross join and putting a comma between the two tables?
(8 answers)
Closed last year.
What is the difference?
SELECT a.name, b.name
FROM a, b;
SELECT a.name, b.name
FROM a
CROSS JOIN b;
If there is no difference then why do both exist?
The first with the comma is an old style from the previous century.
The second with the CROSS JOIN is in newer ANSI JOIN syntax.
And those 2 queries will indeed give the same results.
They both link every record of table "a" against every record of table "b".
So if table "a" has 10 rows, and table "b" has 100 rows.
Then the result would be 10 * 100 = 1000 records.
But why does that first outdated style still exists in some DBMS?
Mostly for backward compatibility reasons, so that some older SQL's don't suddenly break.
Most SQL specialists these days would frown upon someone who still uses that outdated old comma syntax. (although it's often forgiven for an intentional cartesian product)
A CROSS JOIN is a cartesian product JOIN that's lacking the ON clause that defines the relationship between the 2 tables.
In the ANSI JOIN syntax there are also the OUTER joins: LEFT JOIN, RIGHT JOIN, FULL JOIN
And the normal JOIN, aka the INNER JOIN.
But those normally require the ON clause, while a CROSS JOIN doesn't.
And example of a query using different JOIN types.
SELECT *
FROM jars
JOIN apples ON apples.jar_id = jars.id
LEFT JOIN peaches ON peaches.jar_id = jars.id
CROSS JOIN bananas AS bnns
RIGHT JOIN crates ON crates.id = jars.crate_id
FULL JOIN nuts ON nuts.jar_id = jars.id
WHERE jars.name = 'FruityMix'
The nice thing about the JOIN syntax is that the link criteria and the search criteria are separated.
While in the old comma style that difference would be harder to notice. Hence it's easier to forget a link criteria.
SELECT *
FROM crates, jars, apples, peaches, bananas, nuts
WHERE apples.jar_id = jars.id
AND jars.name = 'NuttyFruitBomb'
AND peaches.jar_id = jars.id(+)
AND crates.id(+) = jar.crate_id;
Did you notice that the first query has 1 cartesian product join, but the second has 2? That's why the 2nd is rather nutty.
Both expressions perform a Cartesian product of the two given tables. They are hence equivalent.
Please note that from SQL style point of view, using JOIN has been the preferred syntax for a long time now.

Is there any difference in performance in using AND and WHERE during joins? [duplicate]

Is there ever a case where a join will not return data that a FROM multiple tables with the same conditions returns?
e.g.
SELECT *
FROM TableNames as Names
INNER JOIN TableNumbers as Numbers on Names.ID = Numbers.ID
VS
SELECT *
FROM TableNames as Names, TableNumbers as Numbers
WHERE Names.ID = Numbers.ID
An INNER JOIN (as in your first example) will always return the same data as your a cartesian join with a WHERE filter that uses the same join criteria (your second example).
However, note that this is not true for OUTER JOINs, where NULL values are filtered out in a cartesian join with a WHERE filter as join criteria.
Simply, both the queries are same and do the same thing.
Inner Join is generally considered more readable, especially when you join lots of tables.
The WHERE syntax is more relational model oriented.

What if i dont use Join Keyword in query?

I have a query where i am retrieving data from more than two tables. I am using the filter criteria in where clause but not using any join keyword
select
d.proc_code,
d.dos,
s.svc_type
from
claim_detail d, h_claim_hdr hh, car_svc s
where
d.bu_id="$inp_bu_id"
and
hh.bu_id="$inp_bu_id"
and
s.bu_id="$inp_bu_id"
and
d.audit_nbr="$inp_audit_nbr"
and
hh.audit_nbr="$inp_audit_nbr"
and
d.audit_nbr=hh.audit_nbr
and
s.car_svc_nbr=hh.aut_nbr
Is there a better way of writing this?
Although you are not using a JOIN keyword, your query does perform a JOIN.
A more "modern" way of writing your query (i.e. one following the ANSI SQL standard) would be as follows:
select
d.proc_code,
d.dos,
s.svc_type
from
claim_detail d
join
h_claim_hdr hh on d.audit_nbr=hh.audit_nbr
join
car_svc s on s.car_svc_nbr=hh.aut_nbr
where
d.bu_id="$inp_bu_id"
and
hh.bu_id="$inp_bu_id"
and
s.bu_id="$inp_bu_id"
and
d.audit_nbr="$inp_audit_nbr"
and
hh.audit_nbr="$inp_audit_nbr"
Note that this is simply a modern syntax. It expresses the same query, and it will not impact the performance.
Note that in order for a row to appear in the output of this query, the corresponding rows must exist in all three queries (i.e. it's an inner join). If you would like to return rows of claim_detail for which no h_claim_hdr and / or car_svc existed, use left outer join instead.
A comma in the from clause is essentially the same as a cross join. You really don't want to use a cross join, unless you really know what you are doing.
Proper join syntax has several advantages. The most important of which is the ability to express other types of joins easily and compatibly across databases.
Most people would probably find this version easier to follow and maintain:
select d.proc_code, d.dos, s.svc_type
from claim_detail d join
h_claim_hdr hh
on d.bu_id = hh.bu_id and d.audit_nbr = hh.audit_nbr
car_svc s
on d.bu_id = s.bu_id and s.car_svc_nbr = hh.aut_nbr
where d.bu_id = "$inp_bu_id"
d.audit_nbr = "$inp_audit_nbr";
Using the WHERE clause instead of the JOIN keyword is essentially a different syntax for doing a join. I believe it is called Theta syntax, where using the JOIN clause is called ANSI syntax.
I believe ANSI syntax is almost universally recommended, and some databases require ANSI syntax for outer JOINs.
If you do not use JOIN it will be an implicit inner join. As is in your example with the join criteria on your WHERE clause. So you could me missing records. Lets say you want all records from the first table even if there is not a corresponding record in the second. Your current code would only return the records from the first table that have a matching record in the second.
Joins

Group by in SQL Server giving wrong count

I have a query which works, goes like this:
Select
count(InsuranceOrderLine.AntallPotensiale) as potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name
This query over returns what I need, but when I try to add more table (InsuranceOrder) to be able to get the regardingUser column, then all the count values are way high.
Select
count(InsuranceOrderLine.AntallPotensiale) as Potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori,
RegardingUser
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory, InsuranceSalesLead
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name,RegardingUser
Thanks in advance
You're adding one more table to your FROM statement, but you don't specify any JOIN condition for that table - so your previous result set will do a FULL OUTER JOIN (cartesian product) with your new table! Of course you'll get duplication of data....
That's one of the reasons that I'm recommending never to use that old, legacy style JOIN - do not simply list a comma-separated bunch of tables in your FROM statement.
Always use the new ANSI standard JOIN syntax with INNER JOIN, LEFT OUTER JOIN and so on:
SELECT
count(iol.AntallPotensiale) as Potensiale,
COUNT(iol.AntallSolgt) as Solgt,
ip.Name,
ipc.Name as Kategori,
isl.RegardingUser
FROM
dbo.InsuranceOrderLine iol
INNER JOIN
dbo.InsuranceProduct ip ON iol.FKInsuranceProductId = ip.InsuranceProductID
INNER JOIN
dbo.InsuranceProductCategory ipc ON ip.FKInsuranceProductCategory = ipc.InsuranceProductCategoryID
INNER JOIN
dbo.InsuranceSalesLead isl ON ???????? -- JOIN condition missing here !!
When you do this, you first of all see right away that you're missing a JOIN condition here - how is this new table InsuranceSalesLead linked to any of the other tables already used in this SQL statement??
And secondly, your intent is much clearer, since the JOIN conditions linking the tables are where they belong - right with the JOIN - and don't clutter up your WHERE clauses ...
It looks like you added the table join which slightly multiplies count of rows - make sure, that you properly joining the table. And be careful with aggregate functions over several joined tables - joins very often lead to duplicates