SQL Server Execute Order - sql

As I know the order of execute in SQL is
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY
So I am confused with the correlated query like the below code.
Is FROM WHERE clause in outer query executed first or SELECT in inner query executed first? Can anyone give me idea and explanation? Thanks
SELECT
*, COUNT(1) OVER(PARTITION BY A) pt
FROM
(SELECT
tt.*,
(SELECT COUNT(id) FROM t WHERE data <= 10 AND ID < tt.ID) AS A
FROM
t tt
WHERE
data > 10) t1

As I know the order of execute in SQL is FROM-> WHERE-> GROUP BY-> HAVING -> SELECT ->ORDER BY
False. False. False. Presumably what you are referring to is this part of the documentation:
The following steps show the logical processing order, or binding
order, for a SELECT statement. This order determines when the objects
defined in one step are made available to the clauses in subsequent
steps.
As the documentation explains, this refers to the scoping rules when a query is parsed. It has nothing to do with the execution order. SQL Server -- as with almost any database -- reserves the ability to rearrange the query however it likes for processing.
In fact, the execution plan is really a directed acyclic graph (DAG), whose components generally do not have a 1-1 relationship with the clauses in a query. SQL Server is free to execute your query in whatever way it decides is best, so long as it produces the result set that you have described.

Related

Order Of SQL query

I want to find the order of the sql query.Which query is executed first. What will be the result of each intermediate set of query.
select 0 from
(
select lg.a.aid ,lg.c.number from lg.a
left join lg.c
on lg.a.aid=lg.c.aid
)
as t1
where t1.number is null
There is no "first" or "second" thing that gets executed.
SQL is a descriptive language. The SQL query describes the result set. From a semantic perspective (understanding what the query means), there are rules that say "look at the from clause first, then the where, then the group by, and so on). However, this only descries the parsing phase of the query.
What actually gets executed (in almost all SQL engines) is something called a directed-acyclic graph (DAG). This represents a dataflow of components that do the processing. The SQL compiler and optimizer create the DAG. The relationship between the DAG and the original query is simple: the results from the DAG should be what the query intends.
Your example is:
select 0
from (select lg.a.aid, lg.c.number
from lg.a left join
lg.c
on lg.a.aid = lg.c.aid
) as t1
where t1.number is null;
It is not clear what you mean by your question: "Which query is executed first?" This example has only one query, albeit with a subquery. In any reasonable way of processing the query, the logic for the subquery would need to be executed. However, your query is equivalent to:
select 0
from lg.a left join
lg.c
on lg.a.aid = lg.c.aid
where lg.number is null;
And that would have the same execution plan in most databases.

SQL 'group by' equivalent query

I've seen the following example:
Let T be a table with 2 columns - [id,value] (both int)
Then:
SELECT * FROM T
WHERE id=(SELECT MAX(id) FROM T t2 where T.value=t2.value);
is equivalent to:
SELECT MAX(id) FROM T GROUP BY value
What is going on behind the scene? How can we refer to T1.value?
What is the meaning of T1.value=t2.value?
#JuanCarlosOropeza is correct, your premise is false. Those are not equivalent queries. The second query should error out. But more to the point. The purpose of the WHERE clause in the subquery is to restrict the rows in the subquery to the id from the outer query.
For what's going on behind the scenes, use the explain plan, which provides information about how the optimizer decides to get the data your query asks for.

Oracle FIRST_ROWS optimizer hint

I'm writing a query against what is currently a small table in development. In production, we expect it to grow quite large over the life of the table (the primary key is a number(10)).
My query does a selection for the top N rows of my table, filtered by specific criteria and ordered by date ascending. Essentially, we're assigning records, in bulk, to a specific user for processing. In my case, N will only be 10, 20, or 30.
I'm currently selecting my primary keys inside a subselect, using rownum to limit my results, like so:
SELECT log_number FROM (
SELECT
il2.log_number,
il2.final_date
FROM log il2
INNER JOIN agent A ON A.agent_id = il2.agent_id
INNER JOIN activity lat ON il2.activity_id = lat.activity_id
WHERE (p_criteria1 IS NULL OR A.criteria1 = p_criteria1)
WHERE lat.criteria2 = p_criteria2
AND lat.criteria3 = p_criteria3
AND il2.criteria3 = p_criteria4
AND il2.current_user IS NULL
GROUP BY il2.log_number, il2.final_date
ORDER BY il2.final_date ASC)
WHERE ROWNUM <= p_how_many;
Although I have a stopkey due to the rownum, I'm wondering if using an Oracle hint here (/*+ FIRST_ROWS(p_how_many) */) on the inner select will affect the query plan in the future. I'd like to know more about what the database does when this hint is specified; does it actually make a difference if you have to order the table? (Seems like it wouldn't.) Or does it only affect the select portion, after the access and join parts?
Looking at the explain plan now doesn't get me much as the table hasn't grown yet.
Thanks for your help!
Even with an ORDER BY, different execution plans could be selected when you limit the number of rows returned. It can be easier to select the top n rows by some order key, then sort those, than to sort the entire table then select the top n rows.
However, the GROUP BY is likely to restrict the benefit of this sort of optimization. Grouping (or a DISTINCT operation) generally prevents the optimizer from using a plan that can pipe individual rows into a STOPKEY operation.

Filtering on ROW_NUMBER() is changing the results

I did implement an OData service of my own that takes an SQL statement and apply the top / skip filter using a ROW_NUMBER(). Most statement tested so far are working well except for a statement involving 2 levels of Left Join. For some reason I can't explain, the data returned by the sql is changing when I apply a where clause on the row number column.
For readability (and testing), I removed most of the sql to keep only the faulty part. Basically, you have a Patients table that may have 0 to N Diagnostics and the Diagnostics may have 0 to N Treatments:
SELECT RowNumber, PatientID, DiagnosticID, TreatmentID
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNumber
, *
FROM PATIENTS
LEFT JOIN DIAGNOSTICS ON DIAGNOSTICS.PatientID = PATIENTS.PatientID
LEFT JOIN TREATMENTS ON TREATMENTS.DiagnosticID = DIAGNOSTICS.DiagnosticID
) AS Wrapper
--WHERE RowNumber BETWEEN 1 AND 10
--If I uncomment the line above, I'll get 10 lines that differs from the first 10 line of this query
This is the results I got from the statement above. The result on the left is showing the first 10 rows without the WHERE clause while the one on the right is showing the results with the WHERE clause.
For the record, I'm using SQL Server 2008 R2 SP3. My application is in C# but the problem occurs in SQL server too so I don't think .NET is involved in this case.
EDIT
About the ORDER BY (SELECT NULL), I took that code a while ago from this SO question. However, an order by null will work only if the statement is sorted... in my case, I forgot about adding an order by clause so that's why I was getting some random sorting.
Let me first ask: why do you expect it to be the same? Or rather, why do you expect it to be anything in particular? You haven't imposed an ordering, so the query optimizer is free to use whatever execution operators are most efficient (according to its cost scheme). When you add the WHERE clause, the plan will change and the natural ordering of the results will be different. This can also happen when adding joins or subqueries, for example.
If you want the results to come back in a specific order, you need to actually use the ORDER BY subclause of the ROW_NUMBER() window function. I'm not sure why you are ordering by SELECT NULL, but I can guarantee you that's the problem.

SQL EXISTS Why does selecting rownum cause inefficient execution plan?

Problem
I'm trying to understand why what seems like a minor difference in these two Oracle Syntax Update queries is causing a radically different execution plan.
Query 1:
UPDATE sales s
SET status = 'DONE', trandate = sysdate
WHERE EXISTS (Select *
FROM tempTable tmp
WHERE s.key1 = tmp.key1
AND s.key2 = tmp.key2
AND s.key3 = tmp.key3)
Query 2:
UPDATE sales s
SET status = 'DONE', trandate = sysdate
WHERE EXISTS (Select rownum
FROM tempTable tmp
WHERE s.key1 = tmp.key1
AND s.key2 = tmp.key2
AND s.key3 = tmp.key3)
As you can see the only difference between the two is that the subquery in Query 2 returns a rownum instead of the values of every row.
The execution plans for these two couldn't be more different:
Query1 - Pulls the total results from both tables and uses a sort and a hashjoin to return the results. This peforms well with a favorable 2,346 cost (despite the use of the EXISTS clause and the cohesive subquery).
Query2 - Pulls both table results as well but uses a count and a filter to accomplish the same task and returns an execution plan with an astonishing 77,789,696 cost! I should note that his query just hangs on me so I'm not actually positive this returns the same results (though I believe it should).
From my understanding of the Exists clause it is just a simple boolean check that runs per line of the main table. It doesn't matter if a single row is returned in my EXISTS condition or 100,000 rows... if any results are returned for the row that it is being run, then you've passed the exist check. So why would it matter what my subquery SELECT statement returns?
--------------------EDIT----------------------
Per request, below are the execution plans I'm running in TOAD... please note I edited the table names in my example above for ease - In these plans ALSS_SALES2 = sales above and SALESEXT_TMP = tempTABLE above.
Also should have mentioned but neither of the two tables has indices at this point.. I haven't yet added them to my tempTable and I'm testing with a cheap copy of the sales table which only contains the fields and data but no indices, constraints or security.
Thanks for the assistance everyone!
Query 1 Execution Plan
Query 2 Execution Plan
------------------------------------------------
Questions
1) Why did the call for rownum cause the execution plan to change?
2) What is it about the Filter that is so incredibally inefficient?
3) Am I missing something fundamental with the way the Exists clause works that is causing this change?
Posting the actual query plans would be quite helpful.
In general, though, when the optimizer sees a subquery with rownum, that radically limits its ability to transform the query and merge the results from the subquery with the main query because doing so potentially affects the results. That can be a quick way to force Oracle to materialize a subquery if that happens to be more efficient than the plan chosen by the optimizer. In this case, though, it is probably causing the optimizer to forego a transform step that makes the query more efficient.
Occasionally, you'll see someone take a query like
SELECT b.*
FROM (SELECT <<columns>>
FROM driving_table
WHERE <<conditions>>) a,
b
WHERE a.id = b.id
and tack on a rownum to the a subquery
SELECT b.*
FROM (SELECT <<columns>>, rownum
FROM driving_table
WHERE <<conditions>>) a,
b
WHERE a.id = b.id
in order to force the optimizer to evaluate the a subquery before executing the join. Normally, of course, the optimizer should do this by default if it is more efficient. But if the optimizer makes a mistake, adding rownum can be quicker than figuring out the right set of hints to force a plan or digging in to the underlying problem to figure out the right solution.
Of course, in the particular case that you have a subquery in a WHERE EXISTS where the only use of rownum comes in the SELECT list, we humans can detect that the rownum shouldn't prevent any query transform step that the optimizer would care to use. The optimizer, though, is probably using a more general rule that says that subqueries that reference a function like rownum must be completely executed (this may depend on the exact Oracle version and/or the optimizer settings). So the optimizer is realistically doing a bunch of extra work because it's not smart enough to recognize that the rownum you added cannot possibly affect the results of the query.
Just a question, what's the execution plan for this query:
UPDATE sales s
SET status = 'DONE', trandate = sysdate
WHERE EXISTS (Select NULL
FROM tempTable tmp
WHERE s.key1 = tmp.key1
AND s.key2 = tmp.key2
AND s.key3 = tmp.key3);
It visualize what is needed in an EXISTS (...) expression - actually nothing! As already stated Oracle just have to check if anything is returned, not what is returned in Sub-Query.