Are cartesian (cross) joins with where statement still slower than inner joins? - sql

Compare these two queries:
select b.prod_name, b.prod_category, a.transaction_amt, a.transaction_dt
from transactions a, prod_xref b
where a.prod_id = b.id
VS.
select b.prod_name, b.prod_category, a.transaction_amt, a.transaction_dt
from transactions a
inner join b.prod_xref b on a.prod_id = b.id
Is the first query still slower than the second?
What are the benefits / disadvantages of using a cartesian join vs an explicit join statement?

Answering your question, "Cartesian" or "Cross" join are much slower than almost any joins.
The reason is that CROSS join multiply each row of t1 by each row of t2.
The example you provided is not a CROSS join;
It's the old syntax of inner joins, where 2 or more consecutive tables are given in FROM clause, comma separated.

Related

What is difference between where and join in Hive SQL when joining two tables?

For example,
-- use WHERE
SELECT
a.id
FROM
table_a as a,
table_b as b
WHERE
a.id = b.id;
-- use JOIN
SELECT
t1.id
FROM
(
SELECT
id
FROM
table_a as a
) t1
JOIN (
SELECT
id
FROM
table_b as b
) t2 ON t1.id = t2.id
What is difference between where and join in Hive SQL when joining two tables?
Join like this
FROM
table_a as a,
table_b as b
WHERE
a.id = b.id;
is a bad practice because in general, WHERE is being applied after join and transforming it to JOIN and pushing predicates is upon optimizer, to convert it to proper join and avoid CROSS join (join without ON condition).
Always use explicit JOIN with ON condition when possible, in such way the optimizer will know for sure it is a JOIN condition, also it is ANSI syntax and it is easier to understand.
For not-equi join conditions like a.date between b.start and b.end it is not possible to use in ON condition, in this case they can be moved to the WHERE. In such case if you do not have other conditions in ON condition, cross join will be used, and after that WHERE filter applied, such join can extremely duplicate data before WHERE filter and cause performance degradation. So, always use explicit ANSI JOIN with ON conditions when possible, always use all equality conditions in the ON and only non-equi conditions in the WHERE if not possible to use them in the ON. Keep join conditions in the ON and only filters in the WHERE. Optimizer will push filters to the JOIN or before join when possible but better do not rely on optimizer only, write good ANSI sql which is easy to understand and port to another database if needed.
The difference in plan you can check using EXPLAIN command.

How to implement SQL joins without using JOIN?

How does one implement SQL joins without using the JOIN keyword?
This is not really necessary, but I thought that by doing this I could better understand what joins actually do.
The basic INNER JOIN is easy to implement.
The following:
SELECT L.XCol, R.YCol
FROM LeftTable AS L
INNER JOIN RightTable AS R
ON L.IDCol=R.IDCol;
is equivalent to:
SELECT L.XCol, R.YCol
FROM LeftTable AS L, RightTable AS R
WHERE L.IDCol=R.IDCol;
In order to extend this to a LEFT/RIGHT/FULL OUTER JOIN, you only need to UNION the rows with no match, along with NULL in the correct columns, to the previous INNER JOIN.
For a LEFT OUTER JOIN, add:
UNION ALL
SELECT L.XCol, NULL /* cast the NULL as needed */
FROM LeftTable AS L
WHERE NOT EXISTS (
SELECT * FROM RightTable AS R
WHERE L.IDCol=R.IDCol)
For a RIGHT OUTER JOIN, add:
UNION ALL
SELECT NULL, R.YCol /* cast the NULL as needed */
FROM RightTable AS R
WHERE NOT EXISTS (
SELECT * FROM LeftTable AS L
WHERE L.IDCol=R.IDCol)
For a FULL OUTER JOIN, add both of the above.
There is an older deprecated SQL syntax that allows you to join without using the JOIN keyword.. but I personally find it more confusing than any permutation of the JOIN operator I've ever seen. Here's an example:
SELECT A.CustomerName, B.Address1, B.City, B.State, B.Zip
FROM dbo.Customers A, dbo.Addresses B
WHERE A.CustomerId = B.CustomerId
In the older way of doing it, you join by separating the tables with a comma and specifying the JOIN conditions in the WHERE clause. Personally, I would prefer the JOIN syntax:
SELECT A.CustomerName, B.Address1, B.City, B.State, B.Zip
FROM dbo.Customers A
JOIN dbo.Addresses B
ON A.CustomerId = B.CustomerId
The reason you should shy away from this old style of join is clarity and readability. When you are simply joining one table to another, it's pretty easy to figure out what's going on. When you're combining multiple types of joins across a half dozen (or more) tables, this older syntax becomes very challenging to manage.
The best way to get a handle on the JOIN operator is working with it. Here's a decent visual example of what the different JOINs do:
http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
Some more info:
https://sqlblog.org/2009/10/08/bad-habits-to-kick-using-old-style-joins
http://www.sqlservercentral.com/blogs/brian_kelley/2009/09/30/the-old-inner-join-syntax-vs-the-new-inner-join-syntax/
When SQL was an infant we didn't have "inner join" "left outer join" etc. All we did was list the tables like this:
FROM table1, table2, table3, .... tablen
Then we had a where clause that was like a novel in length, some of the conditions were for filtering the data, many of the conditions were to join tables, like this
FROM table1, table2, table2, .... tablen
WHERE table1.code = 'x' and table1.id = table3.fk and table2.name like 'a%' and table2.id = table1.fk and tablen.fk = table3.id and table2.dt >= '2014-01-01'
from this we hoped like heck we had all the tables nicely related and we crossed our fingers. The worst case scenario - which happened a lot - was that we forgot to include a table at all in the where clause. This was not nice because what we get when we do that is a "Cartesian product" (basically a multiplication of all rows by the number of rows in the table we missed).
Then came ANSI standard join syntax, and life was better. We now place the join conditions on the join - not in the where clause - and as a bonus the where clause is easier to understand.
I don't think you will find it easier to understand this ancient syntax, for example an outer join was join = bizarre(+) or maybe it was (+)bizarre = join (I try not to remember).
Try http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins

Is inner join the same as equi-join?

Can you tell me if inner join and equi-join are the same or not ?
An 'inner join' is not the same as an 'equi-join' in general terms.
'equi-join' means joining tables using the equality operator or equivalent. I would still call an outer join an 'equi-join' if it only uses equality (others may disagree).
'inner join' is opposed to 'outer join' and determines how to join two sets when there is no matching value.
Simply put: an equi-join is a possible type of inner-joins
For a more in-depth explanation:
An inner-join is a join that returns only rows from joined tables where a certain condition is met. This condition may be of equality, which means we would have an equi-join; if the condition is not that of equality - which may be a non-equality, greater than, lesser than, between, etc. - we have a nonequi-join, called more precisely theta-join.
If we do not want such conditions to be necessarily met, we can have
outer joins (all rows from all tables returned), left join (all rows
from left table returned, only matching for right table), right join
(all rows from right table returned, only matching for left table).
The answer is NO.
An equi-join is used to match two columns from two tables using explicit operator =:
Example:
select *
from table T1, table2 T2
where T1.column_name1 = T2.column_name2
An inner join is used to get the cross product between two tables, combining all records from both tables. To get the right result you can use a equi-join or one natural join (column names between tables must be the same)
Using equi-join (explicit and implicit)
select *
from table T1 INNER JOIN table2 T2
on T1.column_name = T2.column_name
select *
from table T1, table2 T2
where T1.column_name = T2.column_name
Or Using natural join
select *
from table T1 NATURAL JOIN table2 T2
The answer is No,here is the short and simple for readers.
Inner join can have equality (=) and other operators (like <,>,<>) in the join condition.
Equi join only have equality (=) operator in the join condition.
Equi join can be an Inner join,Left Outer join, Right Outer join
If there has to made out a difference then ,I think here it is .I tested it with DB2.
In 'equi join'.you have to select the comparing column of the table being joined , in inner join it is not compulsory you do that . Example :-
Select k.id,k.name FROM customer k
inner join dealer on(
k.id =dealer.id
)
here the resulted rows are only two columns rows
id name
But I think in equi join you have to select the columns of other table too
Select k.id,k.name,d.id FROM customer k,dealer d
where
k.id =d.id
and this will result in rows with three columns , there is no way you cannot have the unwanted compared column of dealer here(even if you don't want it) , the rows will look like
id(from customer) name(from Customer) id(from dealer)
May be this is not true for your question.But it might be one of the major difference.
The answer is YES, But as a resultset. So here is an example.
Consider three tables:
orders(ord_no, purch_amt, ord_date, customer_id, salesman_id)
customer(customer_id,cust_name, city, grade, salesman_id)
salesman(salesman_id, name, city, commission)
Now if I have a query like this:
Find the details of an order.
Using INNER JOIN:
SELECT * FROM orders a INNER JOIN customer b ON a.customer_id=b.customer_id
INNER JOIN salesman c ON a.salesman_id=c.salesman_id;
Using EQUI JOIN:
SELECT * FROM orders a, customer b,salesman c where
a.customer_id=b.customer_id and a.salesman_id=c.salesman_id;
Execute both queries. You will get the same output.
Coming to your question There is no difference in output of equijoin and inner join. But there might be a difference in inner executions of both the types.

SQL is this equivalent to a LEFT JoIn?

Is this equivalent to a LEFT JOIN?
SELECT DISTINCT a.name, b.name
FROM tableA a,
(SELECT DISTINCT name FROM tableB) as b
It seems as though there is no link between the two tables.
Is there an easier / more efficient way to write this?
Not, it is equivalent to a cross or cartesian join (really bad) with a distinct applied afterwards. It is pretty hard to know what you really want with the query as it stands.
Isn't this the same as
SELECT DISTINCT a.name, b.name
FROM tableA a, tableB b
although I would be questioning the purpose for this query.
I hate the stigma people apply to cartesian joins. They're wonderful when used properly. I have a payroll application and we have to apply all the different taxing authorities to each employee. So, I have one table of employees and one table of taxing authorities.
Anyway.. I just wanted to defend the wonderful cartesian join. (:
</soapbox>
It's ANSI-89 syntax for a cross join, producing a cartesian product (that's bad). Re-written using ANSI-92 JOIN syntax:
If on SQL Server/Oracle/Postgres, use:
SELECT DISTINCT
a.name,
b.name
FROM TABLEA a
CROSS JOIN (SELECT b.name
FROM TABLEB b) AS b
MySQL supports using:
SELECT DISTINCT
a.name,
b.name
FROM TABLEA a
JOIN (SELECT b.name
FROM TABLEB b) AS b
We'd need to know if there is any column(s) to tie records between the two tables to one another in order to update the query to use either an INNER JOIN or OUTER JOIN.
Since there is no joining field you have a cross join. The distinct limits the total number of records to remove duplicates, but probably still is not giving you the answer you want.
Do this. Check the number of records you are getting. Then write a left join joining on company name (or company id which is what you really should have as a join field as company names change frequently). I'll bet you get a different number of records returned. I did this will two tables I had handy and this is what I got:
Table a had 467 records
Table b had 4413 records
The cross join had 2060871
The cross join with the distinct had 826804
The left join had 4712
The inner join had 893
So you can see adding the distinct to the cross join loweres the number of records returned but doesn't guarantee you will get the result you would have had with the correct join. Given that you said the tables were company and company address, it would be very unlikely that a cross join was what you wanted.

Meaning of (+) in SQL queries

I've come across some SQL queries in Oracle that contain '(+)' and I have no idea what that means. Can someone explain its purpose or provide some examples of its use?
Thanks
It's Oracle's synonym for OUTER JOIN.
SELECT *
FROM a, b
WHERE b.id(+) = a.id
gives same result as
SELECT *
FROM a
LEFT OUTER JOIN b
ON b.id = a.id
The + is a short cut for OUTER JOIN, depending on which side you put it on, it indicates a LEFT or RIGHT OUTER JOIN
Check the second entry in this forum post for some examples
You use this to assure that the table you're joining doesn't reduce the amount of records returned. So it's handy when you're joining to a table that may not have a record for every key you're joining on.
For example, if you were joining a Customer and Purchase table:
To list all customers and all their purchases, do an outer join (+) on the Purchase table so customers that haven't purchased anything still show up in your report.
IIRC, the + is used in older versions of Oracle to indicate an outer join in the pre-ANSI SQL join syntax. In other words:
select foo,bar
from a, b
where a.id = b.id+
is the equivalent of
select foo,bar
from a left outer join b
on a.id = b.id
NOTE: this may be backwards/slightly incorrect, as I've never used the pre-ANSI SQL syntax.