UPDATE and FROM - sql

I saw this query in one of the posts, to update a table by the records on another table, but I couldn't understand logic behind it. Especially I have never used FROM in an UPDATE query.
And...
(If I want to be blunt, I should say I am looking for an insight to understand how can I translate/debug/trace or in a nutshell understand SQL results by each line. When I see a line of code I know what is it doing, but so far, I could only memorize structure of SQL queries by practice.
I want to know which line would be processed first, what comes next and... so I can write more complex codes. Could you please provide me with a reference?)
UPDATE
T
SET
T.col1 = OT.col1,
T.col2 = OT.col2
FROM
Some_Table T
INNER JOIN
Other_Table OT
ON
T.id = OT.id
WHERE
T.col3 = 'cool'

Frist of all try this:
SELECT
T.col1 , OT.col1,
T.col2 , OT.col2
FROM
Some_Table T
INNER JOIN
Other_Table OT
ON
T.id = OT.id
WHERE
T.col3 = 'cool'
This will show you what is the result of the update would be.

First off, I found that in SQL Engines, hierarchies of processing commands are different, so there is no unique way of thinking and understanding SQL like lines of codes.
I also found out that FROM is usually process at first, and then it will goes up to select. In all SQL commands there are a unique hierarchy, so I simply think about SQL a human language and then try to find the logical way to do what I think it should have done, and in most cases this will work.
For example, in SELECT, the machine should first bring a table, and then find what condition should be apply, so WHERE is next, and then the data should be prepared for viewing, so SELECT will be processed next. This is the closest thing I found. :)

Related

Difference between SQL statements

I have come across two versions of an SQLRPGLE program and saw a change in the code as below:
Before:
Exec Sql SELECT 'N'
INTO :APRFLG
FROM LG751F T1
INNER JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y';
After:
Exec Sql SELECT case
when T1.ISAPRV <> 'Y' then 'N'
else T1.ISAPRV
end as APRFLG
INTO :APRFLG
FROM LG751F T1
join LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y'
group by T1.ISAPRV;
Could you please tell me if you see any difference in how the codes would work differently? The second SQL has a group by which is supposed to be a fix to avoid -811 SQLCod error. Apart from this, do you guys spot any difference?
They are both examples of poor coding IMO.
The requirement to "remove duplicates" is often an indication of a bad statement design and/or a bad DB design.
You appear to be doing an existence check, in which case you should be making use of the EXISTS predicate.
select 'N' into :APRFLG
from sysibm.sysdummy1
where exists (select 1
FROM LG751F T1
INNER JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN
AND T1.ISITNO = T2.IDMDNO
WHERE
T2.IDVIN = :M_VIN
AND T1.ISAPRV <> 'Y');
As far as the original two statements, besides the group by, the only real difference is moving columns from the JOIN clause to the WHERE clause. However, the query engine in Db2 for i will rewrite both statements equivalently and come up with the same plan; since an inner join is used.
EDIT : as Mark points out, there JOIN and WHERE are the same in both the OP's statements. But I'll leave the statement above in as an FYI.
I don't find a compelling difference, other that the addition of the group by, that will have the effect of suppressing any duplicate rows that might have been output.
It looks like the developer intended for the query to be able to vary its output to be sometimes Y and sometimes N, but forgot to remove the WHERE clause that necessarily forces the case to always be true, and hence it to always output N. This kind of pattern is usually seen when the original report includes some spec like "don't include managers in the employee Sakarya report" and that then changes to "actually we want to know if the employee is a manager or not". What was a "where employee not equal manager" becomes a "case when employee is manager then... else.." but the where clause needs removing for it to have any effect
The inner keyword has disappeared from the join statement, but overall this should also be a non-op
Another option is just to use fetch first row only like this:
Exec Sql
SELECT 'N'
INTO :APRFLG
FROM LG751F T1
JOIN LG752F T2
ON T1.ISBOLN = T2.IDBOLN AND
T1.ISITNO = T2.IDMDNO
WHERE T2.IDVIN = :M_VIN AND
T1.ISAPRV <> 'Y'
fetch first row only;
That makes it more obvious that you only want a single row rather than trying to use grouping which necessitates the funky do nothing CASE statement. But I do like the EXISTS method Charles provided since that is the real goal, and having exists in there makes it crystal clear.
If your lead insists on GROUP BY, you can also GROUP BY 'N' and still leave out the CASE statement.

Obnoxious WHERE Clause in UPDATE

I found the following WHERE clause in an UPDATE statement, and I hate it. The only way I can think to make it different would be with possibly a CTE and some Unions.
FROM dbo.Table1 T1
INNER JOIN #Table2 T2 ON T1.IntField1 = T2.IntField1
WHERE (ISNULL(T1.IntField2, 0) <> ISNULL(T2.IntField2, 0)
OR ISNULL(T1.IntField3, 0) <> ISNULL(T2.IntField3, 0))
AND (T2.IntField1 IN (
SELECT IntField1
FROM dbo.Table3)
OR T2.IntField1 IS NULL)
I think I've just been staring at this too long. I just happened to look at this SP, and see this. Really felt like something could be done differently/better.
It's not the prettiest no, but no real need to change it unless it is performing badly. Don't ever change SQL code just because you don't like the way it looks, that is often counterproductive because some of the worst looking code is the most performant and the DBA will not thank you for changing their tuned code. Thinking you should change SQL code to suit your personal prefernces is BAD habit you need to break. Read up about performance tuning instead and refactor to improve performance not to suit your prejudices of what is pretty or (worse elegant!) code.
There are two things I can see that might help this though. First why do you need OR T2.IntField1 IS NULL? Since you are joining in an Inner join to table1 on that field, there can never be a result set where T2.IntField1 IS NULL.
The other thing depends on what else #table2 is used for. But since you are clearly creating and populating this table earlier, why not do the conversion of the T2.IntField2 and T2.IntField3 to 0 when they are null at the time the data is put into the table? That would reduce the complexity of the update query some. However, if you need those nulls for some other purpose during the process you can't do this.
It looks like you could combine the elements of the where clause into joins:
Overview:
1) NOT (A AND B) is the same as NOT(A) OR NOT(B)
2) IN OR NULL can be combined in an ISNULL() join.
FROM dbo.Table1 T1
JOIN #Table2 T2 ON T1.IntField1 = T2.IntField1
AND NOT
(
ISNULL(T1.IntField2, 0) = ISNULL(T2.IntField2, 0)
and
ISNULL(T1.IntField3, 0) = ISNULL(T2.IntField3, 0)
)
JOIN dbo.Table3 t3 on
t3.IntField1 = ISNULL(T2.IntField1, t3.IntField1)
But as it's been stated before, if performance is the only focus, this -- although more readable (in my opinion) -- is not necessary.

How do i update multiple rows(thousands) with all having different values in the lest possible time

UPDATE element e1 SET e1.line_number =
(
SELECT t.r FROM
(
select ele,rownum r from
(
select nvl(par.SEQUENCE,ch.SEQUENCE),ch.SEQUENCE,nvl2(par.SEQUENCE,ch.SEQUENCE,0),ch.element_id ele
from element par right join element ch on par.element_id=ch.parent_element_id
where ch.document_id = 78384 order by 1,3,2
)
) t ,element e1
WHERE e1.element_id = t.ele
) WHERE e1.document_id = 78384;
I cannot directly provide the answer you were requesting, but give some hints which may help you:
First of all, you should give us a few hints about your use case. One may guess what's happening in your statement, but it's much easier to help you if you describe what you have and what you need.
You have 3 cascaded selects, which is usually not the best idea. Plus you have an order by in the inner select, which should not be necessary.
Basically you want to filter all rows which needs to be updated - try to put that into a select using joins. And always use the condition first which most likely filters out the most rows.
And always when trying to optimize a statement, have a look at the execution plan by using the explain plan command. With that you can see things like usage of full table scans or indexes. If necessary, you can then provide optimizer hints to control the best way of executing the statement.

SQL - Relationship between a SubQuery and an Outer Table

Problem
I need to better understand the rules about when I can reference an outer table in a subquery and when (and why) that is an inappropriate request. I've discovered a duplication in an Oracle SQL query I'm trying to refactor but I'm running into issues when I try and turn my referenced table into a grouped subQuery.
The following statement works appropriately:
SELECT t1.*
FROM table1 t1,
INNER JOIN table2 t2
on t1.id = t2.id
and t2.date = (SELECT max(date)
FROM table2
WHERE id = t1.id) --This subquery has access to t1
Unfortunately table2 sometimes has duplicate records so I need to aggregate t2 first before I join it to t1. However when I try and wrap it in a subquery to accomplish this operation, suddenly the SQL engine can't recognize the outer table any longer.
SELECT t1.*
FROM table1 t1,
INNER JOIN (SELECT *
FROM table2 t2
WHERE t1.id = t2.id --This loses access to t1
and t2.date = (SELECT max(date)
FROM table2
WHERE id = t1.id)) sub on t1.id = sub.id
--Subquery loses access to t1
I know these are fundamentally different queries I'm asking the compiler to put together but I'm not seeing why the one would work but not the other.
I know I can duplicate the table references in my subquery and effectively detach my subquery from the outer table but that seems like a really ugly way of accomplishing this task (what with all the duplication of code and processing).
Helpful References
I found this fantastic description of the order in which clauses are executed in SQL Server: (INNER JOIN ON vs WHERE clause). I'm using Oracle but I would think that this would be standard across the board. There is a clear order to clause evaluation (with FROM being first) so I would think that any clause occuring further down the list would have access to all information previously processed. I can only assume my 2nd query somehow changes that ordering so that my subquery is being evaluated too early?
In addition, I found a similar question asked (Referencing outer query's tables in a subquery
) but while the input was good they never really explained why he couldn't do what he is doing and just gave alternative solutions to his problem. I've tried their alternate solutions but it's causing me other issues. Namely, that subquery with the date reference is fundamental to the entire operation so I can't get rid of it.
Questions
I want to understand what I've done here... Why can my initial subquery see the outer table but not after I wrap the entire statement in a subquery?
That said, if what I'm trying to do can't be done, what is the best way of refactoring the first query to eliminate the duplication? Should I reference table1 twice (with all the duplication that requires)? Or is there (probably) a better way of tackling this problem?
Thanks in advance!
------EDIT------
As some have surmised these queries above are not the actually query I'm refactoring but an example of the problem I'm running into. The query I'm working with is a lot more complicated so I'm hesitant to post it here as I'm afraid it will get people off track.
------UPDATE------
So I ran this by a fellow developer and he had one possible explanation for why my subquery is losing access to t1. Because I'm wrapping this subquery in a parenthesis, he thinks that this subquery is being evaluated before my table t1 is being evaluated. This would definitely explain the 'ORA-00904: "t1"."id": invalid identifier' error I've been receiving. It would also suggest that like arithmetic order of operations, that adding parens to a statement gives it priority within certain clause evaluations. I would still love for an expert to weigh in if they agree/disagree that is a logical explanation for what I'm seeing here.
So I figured this out based on the comment that Martin Smith made above (THANKS MARTIN!) and I wanted to make sure I shared my discovery for anyone else who trips across this issue.
Technical Considerations
Firstly, it would certainly help if I used the proper terminology to describe my problem: My first statement above uses a correlated subquery:
http://en.wikipedia.org/wiki/Correlated_subquery
http://www.programmerinterview.com/index.php/database-sql/correlated-vs-uncorrelated-subquery/
This is actually a fairly inefficient way of pulling back data as it reruns the subquery for every line in the outer table. For this reason I'm going to look for ways of eliminating these type of subqueries in my code:
https://blogs.oracle.com/optimizer/entry/optimizer_transformations_subquery_unesting_part_1
My second statement on the other hand was using what is called an inline view in Oracle also known as a derived table in SQL Server:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries007.htm
http://www.programmerinterview.com/index.php/database-sql/derived-table-vs-subquery/
An inline view / derived table creates a temporary unnamed view at the beginning of your query and then treats it like another table until the operation is complete. Because the compiler needs to create a temporary view when it sees on of these subqueries on the FROM line, those subqueries must be entirely self-contained with no references outside the subquery.
Why what I was doing was stupid
What I was trying to do in that second table was essentially create a view based on an ambiguous reference to another table that was outside the knowledge of my statement. It would be like trying to reference a field in a table that you hadn't explicitly stated in your query.
Workaround
Lastly, it's worth noting that Martin suggested a fairly clever but ultimately inefficient way to accomplish what I was trying to do. The Apply statement is a proprietary SQL Server function but it allows you to talk to objects outside of your derived table:
http://technet.microsoft.com/en-us/library/ms175156(v=SQL.105).aspx
Likewise this functionality is available in Oracle through different syntax:
What is the equivalent of SQL Server APPLY in Oracle?
Ultimately I'm going to re-evaluate my entire approach to this query which means I'll have to rebuild it from scratch (believe it or not I didn't create this monstrocity originally - I swear!). A big thanks to everyone who commented - this was definitely stumping me but all of the input helped put me on the right track!
How about the following query:
SELECT t1.* FROM
(
SELECT *
FROM
(
SELECT t2.id,
RANK() OVER (PARTITION BY t2.id, t2.date ORDER BY t2.date DESC) AS R
FROM table2 t2
)
WHERE R = 1
) sub
INNER JOIN table1 t1
ON t1.id = sub.id
In your second example you are trying to pass the t1 reference down 2 levels.. you can't do that, you can only pass it down 1 level (which is why the 1st works). If you give a better example of what you are trying to do, we can help you rewrite your query as well.

Correlated subquery (or equivalent) in an SQLite UPDATE statement?

I have an SQLite3 database which, in order to optimize performance, uses computed columns kept up to date by triggers.
I'm now trying to add a trigger which would be analogous to this (untested but probably valid) SQLAlchemy ORM code
story.read_free = any(link.link_type.read_free for link in story.links)
...but I'm having trouble figuring out how to express that as an UPDATE clause. Here's what I've got so far:
CREATE TRIGGER IF NOT EXISTS update_link_type AFTER UPDATE ON link_types
FOR EACH ROW WHEN old.read_free <> new.read_free BEGIN
UPDATE stories SET
read_free = CASE WHEN (
SELECT 1 FROM links as l, link_types as lt WHERE lt.id = new.id AND l.link_type_id = lt.id AND l.story_id = stories.id
) THEN 1 ELSE 0 END
WHERE id = (SELECT story_id from links as l, link_types as lt WHERE l.link_type_id = lt.id AND lt.id = new.id)
;
END;
My specific problem is that I can't figure out how to ensure that the subquery in the CASE is correlated.
Either SQLite rejects the syntax (things like UPDATE foo AS bar and UPDATE INNER JOIN ... which are apparently how you do it on other DBs) or, as in the example I gave, it's valid, but has the wrong meaning. (In this case, "Set read_free on this story if there exists any link type with read_free, whether or not the story has links of that type)
If a more clean, concise phrasing of that UPDATE exists beyond simply fixing the problem, I'd also appreciate knowing it. Even if that did work, it'd be a very ugly solution compared to the worst of the rest of my triggers.
Instead of an UPDATE, could you use a INSERT OR REPLACE instead? Unlike UPDATE, INSERT OR REPLACE will accept an embedded SELECT, so you could do the UPDATE foo AS bar or UPDATE INNER JOIN style thing. Your SELECT would just happen to produce duplicates of the rows in stories with just the columns you need changed.
While composing the INSERT OR REPLACE Robie suggested (Using the REPLACE alias to simplify any potential future port to MySQL), I realized that my mind had been stuck in a rut, making wrong assumptions and overcomplicating the problem. (Probably started working on it while sleep deprived and then never questioned my initial conclusions)
I was then able to reformulate my UPDATE to require only a single JOIN (also not supported by SQLite) and then rewrite that as a WHERE subquery.
Here's the final trigger that resulted:
CREATE TRIGGER IF NOT EXISTS update_link_type AFTER UPDATE ON link_types
FOR EACH ROW WHEN old.read_free <> new.read_free BEGIN
UPDATE stories SET read_free = new.read_free
WHERE id IN (SELECT story_id FROM links WHERE link_type_id = new.id)
;
END;
Much cleaner and much more maintainable.
I'm awarding the bounty to Robie for two reasons: First, because I'd have never come up with this answer without him jogging me out of that rut. Second, because if my requirements were as I'd originally believed, his answer would be the best.