If Inner join is: For each row in the left table, find row in the right table where the condition is met.
What's is cross apply? I have read that it's just inner join which gets evaluated row by row, but isn't inner join also evaluated row by row?
How do you explain cross apply in plain English? Is it just inner join but allows more complicated joins?
APPLY is different from JOIN in that it allows for correlated subqueries. For instance:
SELECT ...
FROM outer
APPLY (
SELECT ..
FROM inner WHERE outer.column = inner.column
)
At first this does not seems much of a difference, until you consider relational functions. Since APPLY accepts correlations from the other side, it means you can pass it values as arguments to the function:
SELECT ...
FROM outer
APPLY function(outer.column)
This is something not possible with JOIN.
The CROSS vs. OUTER is the same as with JOINs'.
Inner Join (or simply join):
Given 2 tables, A, and B, and a condition C that correlates A and B (most commonly, an equality relation between 2 fields, one from A, and one from B), joining table A with B based on C means that, for each row in A, check for rows in B where C is met - and return them.
Translating it into an example:
SELECT * FROM A inner join B on A.field1 = B.field5
Here, for each row in A, check for rows in B where A's field1 equals B's field5.
Return all such rows.
Cross Join:
A join that is not based on an explicit condition - rather - it combines every row from A with every row from B, and returns such rows.
Assuming A has 10 rows and B 20, you would get a result set of 200 rows.
Cross Apply: (which I have just learned about thanks to you :)
A cross apply is indeed related to a cross join, hence it bears "cross" in its name as well. What happens in a cross apply, as far as I understand, is:
Given a table A, and a function F, for each row selected by a given select statement from A, cross join it with the results of F.
Let's say A has 10 rows, and F is simply a function that returns 3 constant rows, like
1
2
3
For each one of the 10 rows from A, you will cross join the 3 resulting rows from F. Resulting in a result set of 30 rows.
Now, for which purpose was this statement created, I think I can't help much.
What I can think of, after reading some SO threads, is that it provides performance gains in such cross join operations (you could achieve the same results without using a function such as F and the "Cross-Apply").
This post provides an example of a scenario where such performance gain is achieved.
Related
What will happen in an Oracle SQL join if I don't use all the tables in the WHERE clause that were mentioned in the FROM clause?
Example:
SELECT A.*
FROM A, B, C, D
WHERE A.col1 = B.col1;
Here I didn't use the C and D tables in the WHERE clause, even though I mentioned them in FROM. Is this OK? Are there any adverse performance issues?
It is poor practice to use that syntax at all. The FROM A,B,C,D syntax has been obsolete since 1992... more than 30 YEARS now. There's no excuse anymore. Instead, every join should always use the JOIN keyword, and specify any join conditions in the ON clause. The better way to write the query looks like this:
SELECT A.*
FROM A
INNER JOIN B ON A.col1 = B.col1
CROSS JOIN C
CROSS JOIN D;
Now we can also see what happens in the question. The query will still run if you fail to specify any conditions for certain tables, but it has the effect of using a CROSS JOIN: the results will include every possible combination of rows from every included relation (where the "A,B" part counts as one relation). If each of the three parts of those joins (A&B, C, D) have just 100 rows, the result set will have 1,000,000 rows (100 * 100 * 100). This is rarely going to give the results you expect or intend, and it's especially suspect when the SELECT clause isn't looking at any of the fields from the uncorrelated tables.
Any table lacking join definition will result in a Cartesian product - every row in the intermediate rowset before the join will match every row in the target table. So if you have 10,000 rows and it joins without any join predicate to a table of 10,000 rows, you will get 100,000,000 rows as a result. There are only a few rare circumstances where this is what you want. At very large volumes it can cause havoc for the database, and DBAs are likely to lock your account.
If you don't want to use a table, exclude it entirely from your SQL. If you can't for reason due to some constraint we don't know about, then include the proper join predicates to every table in your WHERE clause and simply don't list any of their columns in your SELECT clause. If there's a cost to the join and you don't need anything from it and again for some very strange reason can't leave the table out completely from your SQL (this does occasionally happen in reusable code), then you can disable the joins by making the predicates always false. Remember to use outer joins if you do this.
Native Oracle method:
WITH data AS (SELECT ROWNUM col FROM dual CONNECT BY LEVEL < 10) -- test data
SELECT A.*
FROM data a,
data b,
data c,
data d
WHERE a.col = b.col
AND DECODE('Y','Y',NULL,a.col) = c.col(+)
AND DECODE('Y','Y',NULL,a.col) = d.col(+)
ANSI style:
WITH data AS (SELECT ROWNUM col FROM dual CONNECT BY LEVEL < 10)
SELECT A.*
FROM data a
INNER JOIN data b ON a.col = b.col
LEFT OUTER JOIN data c ON DECODE('Y','Y',NULL,a.col) = b.col
LEFT OUTER JOIN data d ON DECODE('Y','Y',NULL,a.col) = d.col
You can plug in a variable for the first Y that you set to Y or N (e.g. var_disable_join). This will bypass the join and avoid both the associated performance penalty and the Cartesian product effect. But again, I want to reiterate, this is an advanced hack and is probably NOT what you need. Simply leaving out the unwanted tables it the right approach 95% of the time.
if I have a student and their adviser
I want to print every student with it's adviser even who don't have... so i have left join to student..
select student_name,nvl(adviser_name,'null')
from student left join adviser
on student.adv_id=adviser.adv_Id
now this is a left join.. I do not always remember join as a first solution >>> i always go for select statement as a first choice so how can I change the previous sql command into single or nested select command?
tldr; Just use an OUTER join when such is the goal.
from x,y is just another way of writing a from x cross join y; and a CROSS join results in a Cartesian product - a WHERE restriction establishing a relationship between the tables makes it equivalent to a normal [INNER] JOIN with equivalent join ON condition.
.. a Cartesian product is a mathematical operation which returns a set (or product set or simply product) from multiple sets. That is, for sets A and B, the Cartesian product A × B is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B.
That is, unlike an [OUTER] LEFT join, a CROSS join can't "make up the blank records" because they exist in neither A nor B.
Thus one would have to synthesize (ie. join with a derived table containing) the "missing" records before (or after, I suppose) the CROSS join was applied, which leads back to using an OUTER join to create the derived table ..
See this answer for a graphical (but non-Venn diagram) way to visualize the results; note that NULL is only introduced with the OUTER joins; not INNER or CROSS joins - or a CROSS join pretending to be an INNER join.
You have identified a solution of finding an area within a bounding box /circle using cross join as below:
SELECT A.ID, C.Car
FROM Cars C
CROSS JOIN Areas A
WHERE C.Latitude BETWEEN A.LatitudeMin AND A.LatitudeMax AND
C.Longitude BETWEEN A.LongitudeMin AND A.LongitudeMax
at:
How to cross join in Big Query using intervals?
however, using cross join for large data sets is blocked by GBQ ops team due to constrains on the infrastructure.
Hence, my question: how could I find set of lat,longs within large data table (table A) that are within another set of bounding boxes , small(table B) ?
My query as below has been blocked:
select a.a1, a.a2 , a.mdl, b.name, count(1) count
from TableMaster a
CROSS JOIN places_locations b
where (a.lat
BETWEEN b.bottom_right_lat AND b.top_left_lat)
AND (a.long
BETWEEN b.top_left_long AND b.bottom_right_long)
group by ....
TableMaster is 538 GB with 6,658,716,712 rows (cleaned/absolute minimum)
places_locations varies per query around 5 to 100kb.
I have tried to adapt fake join based on a template:
How to improve performance of GeoIP query in BigQuery?
however, query takes an hour and does not produce any results nor any errors are displayed.
Could you identify a possible path to solve this puzzle at all?
The problem you're seeing is that the cross join generates too many intermediate values (6 billion x 1k = 6 trillion).
The way to work around this is to generate fewer outputs. If you have additional filters you can apply, you should try applying them before you do the join. If you could do the group by (or part of it) before the join, that would also help.
Moreover, for doing the lookup, you could do a more coarse-grained lookup first. That is, if you could do an initial cross join with a smaller table that has course grained regions, then you could join against the larger table on region id rather than doing a cross join.
okey so fake join does work at the end, solution:
` select a.B, a.C , count(1) count from ( SELECT B, C, A, lat, long from [GB_Data.PlacesMasterA] WHERE not B
is null) a
JOIN (SELECT top_left_lat, top_left_long, bottom_right_lat, bottom_right_long, A from
[Places.placeABOXA] ) b on a.A=b.A
where
(a.lat BETWEEN b.bottom_right_lat AND
b.top_left_lat) AND (a.long BETWEEN b.top_left_long AND
b.bottom_right_long) group each by B, C `
I'm learning Access and SQL, but I have a problem using subqueries in the from clause I can't seem to figure out.
Select *
From (LongSubQuery) as a, (LongSubQuery) as b, (LongSubQuery) as c
Where a.field=b.field=c.field;
This works perfectly as long as each of the statements A, B, and C in the from clause returns a record. If the where clause in any of the three statements prevents the return of a record, then none of the statements will return a result. I've tried various NZ and is not null statements with no luck. I'm suspicious it is actually caused by the last line of code making the fields equivalent. Is there any way around this?
First of all, when you do something like select * from A, B, C (where A, B, C are data sets), you are returning the cartesian product of A, B, C; in other words, you will have #(A)*#(B)*#(C) rows (where #(A) is the number of rows in set A). So, of course, if one of the sets is empty, the whole set is empty.
Possible solution: Use unilateral joins:
select *
from
(select ...) as a
left join (select ...) as b on a.aField = b.aField ...
left join (select ...) as c on b.aField = c.aFiedl ...
left join returns all the rows on the left side of the relation and all the matching rows of the right side of the relation if it is fulfilled, and null values if it is not fulfilled.
Be careful when you make the relations. Be sure you use the fields you need. Notice that in this case you can define the condition you are using in the where clause directly in the join construction.
This question already has answers here:
SQL left join vs multiple tables on FROM line?
(12 answers)
Closed 8 years ago.
I'm curious as to why we need to use LEFT JOIN since we can use commas to select multiple tables.
What are the differences between LEFT JOIN and using commas to select multiple tables.
Which one is faster?
Here is my code:
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT no as nonvs,
owner,
owner_no,
vocab_no,
correct
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
...and:
SELECT *
FROM vocab_stats vs,
mst_words mw
WHERE mw.no = vs.vocab_no
AND vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111
First of all, to be completely equivalent, the first query should have been written
SELECT mw.*,
nvs.*
FROM mst_words mw
LEFT JOIN (SELECT *
FROM vocab_stats
WHERE owner = 1111) AS nvs ON mw.no = nvs.vocab_no
WHERE (nvs.correct > 0 )
AND mw.level = 1
So that mw.* and nvs.* together produce the same set as the 2nd query's singular *. The query as you have written can use an INNER JOIN, since it includes a filter on nvs.correct.
The general form
TABLEA LEFT JOIN TABLEB ON <CONDITION>
attempts to find TableB records based on the condition. If the fails, the results from TABLEA are kept, with all the columns from TableB set to NULL. In contrast
TABLEA INNER JOIN TABLEB ON <CONDITION>
also attempts to find TableB records based on the condition. However, when fails, the particular record from TableA is removed from the output result set.
The ANSI standard for CROSS JOIN produces a Cartesian product between the two tables.
TABLEA CROSS JOIN TABLEB
-- # or in older syntax, simply using commas
TABLEA, TABLEB
The intention of the syntax is that EACH row in TABLEA is joined to EACH row in TABLEB. So 4 rows in A and 3 rows in B produces 12 rows of output. When paired with conditions in the WHERE clause, it sometimes produces the same behaviour of the INNER JOIN, since they express the same thing (condition between A and B => keep or not). However, it is a lot clearer when reading as to the intention when you use INNER JOIN instead of commas.
Performance-wise, most DBMS will process a LEFT join faster than an INNER JOIN. The comma notation can cause database systems to misinterpret the intention and produce a bad query plan - so another plus for SQL92 notation.
Why do we need LEFT JOIN? If the explanation of LEFT JOIN above is still not enough (keep records in A without matches in B), then consider that to achieve the same, you would need a complex UNION between two sets using the old comma-notation to achieve the same effect. But as previously stated, this doesn't apply to your example, which is really an INNER JOIN hiding behind a LEFT JOIN.
Notes:
The RIGHT JOIN is the same as LEFT, except that it starts with TABLEB (right side) instead of A.
RIGHT and LEFT JOINS are both OUTER joins. The word OUTER is optional, i.e. it can be written as LEFT OUTER JOIN.
The third type of OUTER join is FULL OUTER join, but that is not discussed here.
Separating the JOIN from the WHERE makes it easy to read, as the join logic cannot be confused with the WHERE conditions. It will also generally be faster as the server will not need to conduct two separate queries and combine the results.
The two examples you've given are not really equivalent, as you have included a sub-query in the first example. This is a better example:
SELECT vs.*, mw.*
FROM vocab_stats vs, mst_words mw
LEFT JOIN vocab_stats vs ON mw.no = vs.vocab_no
WHERE vs.correct > 0
AND mw.level = 1
AND vs.owner = 1111