What is actually sef-join? - sql

I have several question about self join, could anyone help answer it?
is there strict format of self join? There are sample like this:
SELECT a.column_name, b.column_name...
FROM table1 a, table1 b
WHERE a.common_field = b.common_field;
But there are sample like:
SELECT a.ID, b.NAME, a.SALARY
FROM CUSTOMERS a, CUSTOMERS b
WHERE a.SALARY < b.SALARY;
I wonder is the connection (a.common_field = b.common_field) necessary? since both formats are self join.
How will the self join be optimized? will they are treated as INNER JOIN or CROSS JOIN? especially, for the second format, is it SELF CROSS JOIN? In SQLite and PostgreSQL, are they treated same way?
My question is I want to extract a structure from a bunch of graph-like data and My query is like
SELECT A.colum, B.colum,....N.colum
FROM
table1 as A, table1 as B, table1 as C .... table2 as M, table2 as N ....
where
A.colum1<B.colum1 and
C.colum1=D.colum1 and
....
In the query, table1,table2... are single column tables, they are components of final structure. is my problem best in this kind of self-join format? I find it's very slow in PostgreSQL but fast in SQLite which makes me confused.

A self join is no different than any other join as far as structure/behavior goes, but they are typically used in different ways.
You should ditch the deprecated syntax of comma separated lists of tables and use ANSI joins:
SELECT a.column_name, b.column_name...
FROM table1 a
JOIN table1 b
ON a.common_field = b.common_field;
You can specify what type of JOIN you want it to be (JOIN,LEFT JOIN, RIGHT JOIN,CROSS JOIN..), and how you want to relate the tables to each other, just like any other join. Equivalency is not required, as you've noted in your a.Salary < b.Salary example.

No, there's no such thing.
A self join is just a special case of joining the table with itself. Think about it like joining two instances of the same thing (is fact no using two instances but two references)
In general you ill inner self join but you can cross join or outter join a table with itself.
Example:
select * from tbPeople p0
join tbPeople p1 on p1.id = p0.parentId
where p0.id = you
that returns you and your parents
select * from tbPeople p0
left join tbPeople p1 on p1.parentId = p0.id
where p0.id = you
that returns your kids, or just you in case you don't have offspring yet

Related

Join conditions, intermediate SQL

what's the difference between the join conditions "on" and "using" if both are used to select specified column(s)?
The main difference with using is that the columns for the join have to have the same names. This is generally a good practice anyway in the data model.
Another important difference is that the columns come from different tables -- the join condition doesn't specify the tables (some people view this as a weakness, but you'll see it is quite useful).
A handy feature is that the common columns used for the join are removed when you use select *. So
select *
from a join
b
on a.x = b.x
will result in x appearing twice in the result set. This is not allowed for subqueries or views. On the other hand, this query only has x once in the result set.
select *
from a join
b
using (x)
Of course, other columns could be duplicated.
For an outer join, the value is the non-NULL value, if any. This becomes quite handy for full joins:
select *
from a full join
b
using (x) full join
c
using (x);
Because of the null values, expressing this without using is rather cumbersome:
select *
from a full join
b
on b.x = a.x full join
c
on c.x = coalesce(a.x, b.x);
using is just a short-circuit to express the join condition when the related columns have the same name.
Consider the following example:
select ...
from orders o
inner join order_items oi on oi.order_id = o.order_id
This can be shortened with using, as follows:
select ...
from orders o
inner join order_items oi using(order_id)
Notes:
this also works when joining on several columns having identical names
parentheses are mandatory with using

Multiple joins gives sql statement not properly ended

here is my query:
select p1.driver_id, p1.name, o1.driver_id, o1.license, part.license, part.amount
from
(person p1 full outer join owns o1 on p1.driver_id = o1.driver_id) and
(Participated part full outer join ol on part.license = o1.license)
I get the sql statement not properly ended error. I just want to build the table as part of my process of solving a more complicated problem.
That is not how you define the relations in the FROM part. Try the following:
select p1.driver_id, p1.name, o1.driver_id, o1.license, part.license, part.amount
from
person p1
full outer join owns o1 on p1.driver_id = o1.driver_id
full outer join Participated part on part.license = o1.license;
Notes:
After FROM you can list some relations (comma separated), and a relation can also be a a FULL OUTER JOIN b ON ... form as you correctly wrote.
You can also chain these (with brackets if you prefer) as
a FULL OUTER JOIN b ON ... FULL OUTER JOIN c ON ...
You can not use AND to connect relations, that is only used to connect conditions.
You can bracket a statement to form a subquery, such as
(SELECT * FROM a FULL OUTER JOIN b ON ...) AS t
but note t hat you need the SELECT * at the beginning to turn it into a relation and in most systems you need an alias after (t in the example).
SELECT p1.driver_id, p1.name, o1.driver_id, o1.license, part.license, part.amount
FROM person p1
FULL OUTER JOIN owns o1
ON p1.driver_id = o1.driver_id
FULL OUTER JOIN Participated part
ON part.license = o1.license

Adding more condition while joining or in where which is better?

SELECT C.*
FROM Content C
INNER JOIN ContentPack CP ON C.ContentPackId = CP.ContentPackId
AND CP.DomainId = #DomainId
...and:
SELECT C.*
FROM Content C
INNER JOIN ContentPack CP ON C.ContentPackId = CP.ContentPackId
WHERE CP.DomainId = #DomainId
Is there any performance difference between this 2 queries?
Because both queries use an INNER JOIN, there is no difference -- they're equivalent.
That wouldn't be the case if dealing with an OUTER JOIN -- criteria in the ON clause is applied before the join; criteria in the WHERE is applied after the join.
But your query would likely run better as:
SELECT c.*
FROM CONTENT c
WHERE EXISTS (SELECT NULL
FROM CONTENTPACK cp
WHERE cp.contentpackid = c.contentpackid
AND cp.domainid = #DomainId)
Using a JOIN risks duplicates if there's more than one CONTENTPACK record related to a CONTENT record. And it's pointless to JOIN if your query is not using columns from the table being JOINed to... JOINs are not always the fastest way.
There's no performance difference but I would prefer the inner join because I think it makes very clear what is it that you are trying to join on both tables.

What is the difference between using a cross join and putting a comma between the two tables?

What is the difference between
select * from A, B
and
select * from A cross join B
? They seem to return the same results.
Is the second version preferred over the first? Is the first version completely syntactically wrong?
They return the same results because they are semantically identical. This:
select *
from A, B
...is (wince) ANSI-89 syntax. Without a WHERE clause to link the tables together, the result is a cartesian product. Which is exactly what alternative provides as well:
select *
from A
cross join B
...but the CROSS JOIN is ANSI-92 syntax.
About Performance
There's no performance difference between them.
Why Use ANSI-92?
The reason to use ANSI-92 syntax is for OUTER JOIN support (IE: LEFT, FULL, RIGHT)--ANSI-89 syntax doesn't have any, so many databases implemented their own (which doesn't port to any other databases). IE: Oracle's (+), SQL Server's =*
Stumbled upon this post from another SO question, but a big difference is the linkage cross join creates. For example using cross apply or another join after B on the first ('comma') variant, the cross apply or join would only refer to the table(s) after the dot. e.g, the following:
select * from A, B join C on C.SomeField = A.SomeField and C.SomeField = B.SomeField
would create an error:
The multi-part identifier "A.SomeField" could not be bound.
because the join on C only scopes to B, whereas the same with cross join...
select * from A cross join B join C on C.SomeField = A.SomeField and C.SomeField = B.SomeField
..is deemed ok. The same would apply if cross apply is used. For example placing a cross apply on a function after B, the function could only use fields of B, where the same query with cross join, could use fields from both A and B.
Of course, this also means the reverse can be used as well. If you want to add a join solely for one of the tables, you can achieve that by going 'comma' on the tables.
They are the same and should (almost) never be used.
Besides brevity (favoring ,) and consistency (favoring CROSS JOIN), the sole difference is precedence.
The comma is lower precedence than other joins.
For example, the explicit form of
SELECT *
FROM a
CROSS JOIN b
JOIN c ON a.id = c.id
is
SELECT *
FROM (
a
CROSS JOIN b
)
INNER JOIN c ON a.id = c.id
which is valid.
Whereas the explicit form of
SELECT *
FROM a,
b
JOIN c ON a.id = c.id
is
SELECT *
FROM a
CROSS JOIN (
b
INNER JOIN c ON a.id = c.id
)
which is invalid (the join clause references inaccessible a).
In your example, there are only two tables, so the two queries are exactly equivalent.
The first version was originally the only way to join two tables. But it has a number of problems so the JOIN keyword was added in the ANSI-92 standard. They give the same results but the second is more explicit and is to be preferred.
To add to the answers already given:
select * from A, B
This was the only way of joining prior to the 1992 SQL standard. So if you wanted an inner join, you'd have to use the WHERE clause for the criteria:
select * from A, B
where A.x = B.y;
One problem with this syntax was that there was no standard for outer joins. Another was that this gets unreadable with many tables and is hence prone to errors and less maintainable.
select * from A, B, C, D
where B.id = C.id_b
and C.id_d = D.id;
Here we have a cross join of A with B/C/D. On purpose or not? Maybe the programmer just forgot the and B.id = A.id_b (or whatever), or maybe this line was deleted by mistake, and maybe still it was really meant to be a cross join. Who could say?
Here is the same with explicit joins
select *
from A
cross join B
inner join C on C.id_b = B.id
inner join D on D.id = C.id_d;
No doubt about the programmers intentions anymore.
The old comma-separated syntax was replaced for good reasons and should not be used anymore.
These are the examples of implicit and explicit cross joins. See http://en.wikipedia.org/wiki/Join_%28SQL%29#Cross_join.
To the comments as to the utility of cross joins, there is one very useful and valid example of using cross joins or commas in the admittedly somewhat obscure world of Postgres generate_series and Postgis spatial sql where you can use a cross join against generate_series to extract the nth geometry out of a Geometry Collection or Multi-(Polygon/Point/Linestring), see: http://postgis.refractions.net/documentation/manual-1.4/ST_GeometryN.html
SELECT n, ST_AsEWKT(ST_GeometryN(the_geom, n)) As geomewkt
FROM (
VALUES (ST_GeomFromEWKT('MULTIPOINT(1 2 7, 3 4 7, 5 6 7, 8 9 10)') ),
( ST_GeomFromEWKT('MULTICURVE(CIRCULARSTRING(2.5 2.5,4.5 2.5, 3.5 3.5), (10 11, 12 11))') )
) As foo(the_geom)
CROSS JOIN generate_series(1,100) n
WHERE n <= ST_NumGeometries(the_geom);
This can be very useful if you want to get the area, centroid, bounding box or many of the other operations you can perform on a single geometry, when they are contained within a larger one.
I have always written such queries using a comma before generate_series, until one day when I wondered if this really meant cross join, which brought me to this post. Obscure, but definitely useful.

What's the difference between just using multiple froms and joins?

Say I have this query:
SELECT bugs.id, bug_color.name FROM bugs, bug_color
WHERE bugs.id = 1 AND bugs.id = bug_color.id
Why would I use a join? And what would it look like?
Joins are synticatic sugar, easier to read.
Your query would look like this with a join:
SELECT bugs.id, bug_color.name
FROM bugs
INNER JOIN bug_color ON bugs.id = bug_color.id
WHERE bugs.id = 1
With more then two tables, joins help make a query more readable, by keeping conditions related to a table in one place.
The join keyword is the new way of joining tables.
When I learned SQL it did not yet exist, so joining was done the way that you show in your question.
Nowadays we have things like joins and aliases to make the queries more readable:
select
b.id, c.name
from
bugs as b
inner join
bug_color as c on c.id = b.id
where
b.id = 1
Also there are other variations of joins, as left outer join, right outer join and full join, that is harder to accomplish with the old syntax.
Join syntax allows for outer joins, so you can go:
SELECT bugs.id, bug_color.name
FROM bugs, bug_color
LEFT OUTER JOIN bug_color ON bugs.id = bug_color.id
WHERE bugs.id = 1