Is an update with a join standard sql compliant - sql

Is the following standard SQL compliant? If not, then why not?
UPDATE a
SET a.Y = 2
FROM TABLE_A a
INNER JOIN TABLE_B b ON
a.X = b.X
WHERE b.Z = blahblah

The ANSI compliant way to write the query is:
UPDATE TABLE_A
SET Y = 2
WHERE b.Z = blahblah AND
EXISTS (SELECT 1 FROM TABLE_B b WHERE TABLE_A.X = b.X);
To the best of my knowledge, neither ANSI nor ISO provide rationales for why they do not do something. I could speculate that the FROM clause causes issues when there are multiple matches on a given row. Personally, I would not want to be in the room during the arguments about which order the updates take place.

Related

Is it relevant to add verification on both 'on join' and 'where'?

I would like to know if there is any difference between these queries:
1)
SELECT
...
FROM A
JOIN B on B.AId = A.Id and B.X = #x
WHERE
A.Id = 1
and B.X = #x
2)
SELECT
...
FROM A
JOIN B on B.AId = A.Id
WHERE
A.Id = 1
and B.X = #x
3)
SELECT
...
FROM A
JOIN B on B.AId = A.Id and B.X = #x
WHERE
A.Id = 1
Because you are using an INNER JOIN there is no difference. Records are only kept where the join condition is true. If you filter results from b before the join (by specifying in the ON) or afterwards (by specifying in the WHERE) you'll end up with the same result set. (before and after is sort of arbitrary here, but it helps to think through it that way)
Also, your first query is not great since you filter on #x in two different spots. That is superfluous. My preference would be option 2.
If you go back to the origins of SQL there was no specific "join" syntax, instead all filtering was part of the where clause
SELECT
...
FROM A, B
WHERE
A.Id = 1
and B.AId = A.Id
and B.X = #x
So, whilst I absolutely don't wish to promote this older style syntax, it may help you understand that there really is no difference to your options 2 and 3 to the one I have just presented.
Your option 1 has a repeated predicate (and B.X = #x) which most optimizers will probably ignore, but even that option produces that same result.
Filtering in where or Inner Join will do the same job. 'Where' filtering is preferred over filtering in the 'Join'(will be looking good). It may have impact when the join is not 'Inner Join', Say 'Left Outer Join' or 'Right Outer Join'
Which SQL query is faster? Filter on Join criteria or Where clause?

What kind of join is used in a Vertica UPDATE statement?

Vertica has an interesting update syntax when updating a table based on a join value. Instead of using a join to find the update rows, it mandates a syntax like this:
UPDATE a
SET col = b.val
where a.id = b.id
(Note that this syntax is indeed mandated in this case, because Vertica prohibits us from using a where clause that includes a "self-join", that is a join referencing the table being updated, in this case a.)
This syntax is nice, but it's less explicit about the join being used than other SQL dialects. For example, what happens in this case?
UPDATE a
SET col = CASE 0 if b.id IS NULL ELSE b.val END
where a.id = b.id
What happens when a.id has no match in b.id? Does a.col not get updated, as though the condition a.id = b.id represented an inner join of a and b? Or does it get updated to zero, as if the condition were a left outer join?
I think Vertica uses the Postgres standard for this syntax:
UPDATE a
SET col = b.val
FROM b
whERE a.id = b.id;
This is an INNER JOIN. I agree that it would be nice if Postgres and the derived databases supported explicit JOINs to the update table (as some other databases do). But the answer to your question is that this is an INNER JOIN.
I should note that if you want a LEFT JOIN, you have two options. One is a correlated subquery:
UPDATE a
SET col = (SELECT b.val FROM b whERE a.id = b.id);
The other is an additional level of JOIN (assuming that id is unique in a):
UPDATE a
SET col = b.val
FROM a a2 LEFT JOIN
b
ON a2.id = b.id
WHERE a.id = a2.id;

2 Outer Joins on Same Table?

Here is a question which has been boggling me for few days now, and I searched and searched but couldn't find any convincing answer !
Simple question, why is it restricted to have 2 Outer Joins in SQL, on same table even with different columns being used, check the queries below for better understanding. Also I can overcome them using nested sub query or ANSI joins, but then why it is even restricted in the first place using (+) operator!
In this question I'm referring to the error :
ORA-01417: a table may be outer joined to at most one other table
What I want to ask is why this is allowed :
select * from
a, b, c
where a.a1 = b.b1
and a.a2 = c.c1
And why this is not allowed:
select * from
a, b, c
where a.a1(+) = b.b1
and a.a2(+) = c.c1
Please leave ANSI and Nested SubQueries alone
The restriction is described in Oracle documentation: Outer Joins
Oracle recommends that you use the FROM clause OUTER JOIN syntax rather than the Oracle join operator. Outer join queries that use the Oracle join operator (+) are subject to the following rules and restrictions, which do not apply to the FROM clause OUTER JOIN syntax:
...
In a query that performs outer joins of more than two pairs of tables, a single table can be the null-generated table for only one other table. For this reason, you cannot apply the (+) operator to columns of B in the join condition for A and B and the join condition for B and C. Refer to SELECT for the syntax for an outer join.
which basically means (described in ANSI/ISO syntax) that you can't have with the old (+) syntax what is perfectly valid in ANSI/ISO:
--- Query 1 ---
a
RIGHT JOIN b
ON a.x = b.x
RIGHT JOIN c
ON a.y = c.y
or:
--- Query 1b ---
c
LEFT JOIN
b LEFT JOIN a
ON a.x = b.x
ON a.y = c.y
That's only one of the many restrictions of the old Oracle syntax.
As for the reasons for this restriction, it may be implementation details or/and the ambiguity of such joins. While the two joins above are 100% equivalent, the following is not equivalent to the above two:
--- Query 2 ---
a
RIGHT JOIN c
ON a.y = c.y
RIGHT JOIN b
ON a.x = b.x
See the test in SQL-Fiddle. So the question arises. How should the proprietary join be interpreted, as query 1 or 2?
FROM a, b, c
WHERE a.y (+) = c.y
AND a.x (+) = b.x
There is no restriction if a table appears on the left side of (2 or more) outer joins. These are perfectly valid, even with the old syntax:
FROM a
LEFT JOIN b ON a.x = b.x
  LEFT JOIN c ON a.y = c.y
...
LEFT JOIN z ON a.q = z.q
FROM a, b, ..., z
WHERE a.x = b.x (+)
  AND a.y = c.y (+)
...
AND a.q = z.q (+)
I strongly suggest to use explicit OUTER JOIN syntax. Starting from Oracle 12c this restriction is relaxed 1.4.3 Enhanced Oracle Native LEFT OUTER JOIN Syntax:
In previous releases of Oracle Database, in a query that performed outer joins of more than two pairs of tables, a single table could be the null-generated table for only one other table. Beginning with Oracle Database 12c, a single table can be the null-generated table for multiple tables.
Code:
CREATE TABLE a AS
SELECT 1 AS a1, 2 AS a2 FROM dual;
CREATE TABLE b AS
SELECT 1 AS b1 FROM dual;
CREATE TABLE c AS
SELECT 3 AS c1 FROM dual;
-- Oracle 12c: code below will work
SELECT *
FROM a, b, c
WHERE a.a1(+) = b.b1
AND a.a2(+) = c.c1;
Output:
A1 A2 B1 C1
- - 1 3
db<>fiddle demo - Oracle 11g will return error
db<>fiddle demo Oracle 12c/18c will return resultset

Effect of style/format on SQL

Ignoring version, what are the best practices for formatting SQL code?
I prefer this way (method A):
select col from a inner join b on a.id = b.id inner join c on b.id = c.id
a colleague prefers another (method B):
select col from a inner join (b inner join c on b.id=c.id) on a.id = b.id
I'd like to know if there is any difference - the query optimiser appears to generate the same execution plan for both. So maybe it is just readability?
This is the first time I've seen SQL written using method B, does anyone else write SQL like this? Personally I find it really difficult to read method B.
EDIT: Please note the code is on one line and in upper case to make both more comparable for the purpose of this question.
I think A is more readable, and most sample code out there uses that style. Both parse the same and product the same query plan, so as far as SQL Server is concerned, there is no difference.
I normally also uppercase keywords and indent for readability:
SELECT col
FROM a
INNER JOIN b
ON a.id = b.id
INNER JOIN c
ON b.id = c.id
Method B is a subselect-like syntax, but it is parsed the same way as method A. There's no harm in using it. I personally prefer method A too, because it can be read in a lineair fashion.
My personal preference is
SELECT col1, col2, col3,
col4, col5
FROM a
INNER JOIN b ON a.id = b.id
INNER JOIN c ON b.id = c.id
WHERE a.col1 = 1
I think consistency is key, I prefer your way over your colleagues for readability.

can this be written with an outer join

The requirement is to copy rows from Table B into Table A. Only rows with an id that doesn't already exist, need to be copied over:
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM B b
WHERE b.id IS NOT IN (SELECT id FROM A WHERE x='t');
^^^^^^^^^^^
Now, I was trying to write this with an outer join to compare the explain paths, but I can't write this (efficiently at least).
Note that the sql highlighted with ^'s make this tricky.
try
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM TableB b
Left Join TableA a
On a.Id = b.Id
And a.x = 't'
Where a.Id Is Null
But I prefer the subquery representation as I think it more clearly expresses what you are doing.
Why are you not happy with what you have? If you check your explain plan, I promise you it says that an anti-join is performed, if the optimizer thinks that is the most efficient way (which it most likely will).
For everyone who reads this: SQL is not what actually is executed. SQL is a way of telling the database what you want, not what to do. All decent databases will be able to treat NOT EXISTS and NOT IN as equal (when they are, ie. there are no null values) and perform an anti-join. The trick with an outer join and an IS NULL condition doesn't work on SQL Server, though (SQL Server is not clever enough to transform it to an antijoin).
Your query will perform better than the query with outer join.
I guess the following query will do the job:
INSERT INTO A(id, x, y)
SELECT id, x, y
FROM B b
LEFT JOIN A a
ON b.id = a.id AND NOT a.x='t'
INSERT INTO A (id, x, y)
SELECT
B.id, B.x, B.y
FROM
B
WHERE
NOT EXISTS (SELECT * FROM A WHERE B.id = A.id AND A.x = 't')