What is the order of execution for this SQL statement - sql

I have below SQL Query :
SELECT TOP 5 C.CustomerID,C.CustomerName,C.CustomerSalary
FROM Customer C
WHERE C.CustomerSalary > 10000
ORDER BY C.CustomerSalary DESC
What will be execution order of the following with proper explanation ?
TOP Clause
WHERE Clause
ORDER BY Clause

Check out the documentation for the SELECT statement, in particular this section:
Logical Processing Order of the SELECT statement
The following steps show the logical processing order, or binding
order, for a SELECT statement. This order determines when the objects
defined in one step are made available to the clauses in subsequent
steps. For example, if the query processor can bind to (access) the
tables or views defined in the FROM clause, these objects and their
columns are made available to all subsequent steps. Conversely,
because the SELECT clause is step 8, any column aliases or derived
columns defined in that clause cannot be referenced by preceding
clauses. However, they can be referenced by subsequent clauses such as
the ORDER BY clause. Note that the actual physical execution of the
statement is determined by the query processor and the order may vary
from this list.
which gives the following order:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP

WHERE
ORDER BY
TOP
Here is a good article about that: http://blog.sqlauthority.com/2009/04/06/sql-server-logical-query-processing-phases-order-of-statement-execution/

Simply remember this phrase:-
Fred Jones' Weird Grave Has Several Dull Owls
Take the first letter of each word, and you get this:-
FROM
(ON)
JOIN
WHERE
GROUP BY
(WITH CUBE or WITH ROLLUP)
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Hope that helps.

This is exact execution order, with your case.
1-FROM
2-WHERE
3-SELECT
4-ORDER BY
5-TOP

TOP, WHERE, and ORDER BY are not "executed" - they simply describe the desired result and the database query optimizer determines (hopefully) the best plan for the actual execution. The separation between "declaring the desired result" and how it is physically achieved is what makes SQL a "declarative" language.
Assuming there is an index on CustomerSalary, and the table is not clustered, your query will likely be executed as an index seek + table heap access, as illustrated in this SQL Fiddle (click on View Execution Plan at the bottom):
As you can see, first the correct CustomerSalary value is found through the Index Seek, then the row that value belongs to is retrieved from the table heap through RID Lookup (Row ID Lookup). The Top is just for show here (and has 0% cost), as is the Nested Loops for that matter - the starting index seek will return (at most) one row in any case. The whole query is rather efficient and will likely cost only a few I/O operations.
If the table is clustered, you'll likely have another index seek instead of the table heap access, as illustrated in this SQL Fiddle (note the lack of NONCLUSTERED keyword in the DDL SQL):
But beware: I was lucky this time to get the "right" execution plan. The query optimizer might have chosen a full table scan, which is sometimes actually faster on small tables. When analyzing query plans, always try to do that on realistic amounts of data!

Visit https://msdn.microsoft.com/en-us/library/ms189499.aspx for a better explanation.
The following steps show the logical processing order, or binding order, for a SELECT statement. This order determines when the objects defined in one step are made available to the clauses in subsequent steps. For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps. Conversely, because the SELECT clause is step 8, any column aliases or derived columns defined in that clause cannot be referenced by preceding clauses. However, they can be referenced by subsequent clauses such as the ORDER BY clause. Note that the actual physical execution of the statement is determined by the query processor and the order may vary from this list.
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP

My $0,02 here.
There's two different concepts in action here: the logical execution order and the plan of query execution. An other was to see it is who answers the following questions:
How MSSQL understood my SQL Query?
What it'll do to execute it in the best possible way given the current schema and data?
The first question is answered by the logical execution order. Brian's answer show what it is. It's the way SQL understood your command: "FROM Customer table (aliased as C) consider only the rows WHERE the C.CustomerSalary > 10000, ORDER them BY C.CustomerSalary in descendent order and SELECT the columns listed for the TOP 5 rows". The resultset will obey that meaning
The second question's answer is the query execution plan - and it depends on your schema (table definitions, selectivity of data, quantity of rows in the customer table, defined indexes, etc) since is heavily dependant of SQL Server optimizer internal workings.

Here is the complete sequence for sql server :
1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE or WITH ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP
So from the above list, you can easily understand the execution sequence of TOP, WHERE and ORDER BY which is :
1. WHERE
2. ORDER BY
3. TOP
Get more information about it from Microsoft

Related

Strange issue with the Order By --SQL

Few days ago I came across a strange problem with the Order By , While creating a new table I used
Select - Into - From and Order By (column name)
and when I open that table see tables are not arranged accordingly.
I re-verified it multiple times to make sure I am doing the right thing.
One more thing I would like to add is till the time I don't use INTO, I can see the desired result but as soon as I create new table, I see there is no Order for tht column. Please help me !
Thanks in advance.. Before posting the question I did research for 3 days but no solution yet
SELECT
[WorkOrderID], [ProductID], [OrderQty], [StockedQty]
INTO
[AdventureWorks2012].[Production].[WorkOrder_test]
FROM
[AdventureWorks2012].[Production].[WorkOrder]
ORDER BY
[StockedQty]
SQL 101 for beginners: SELECT statements have no defined order unless you define one.
When i open that table
That likely issues a SELECT (TOP 1000 IIFC) without order.
While creating a new table i used Select - Into - From and Order By (column name)
Which sort of is totally irrelevant - you basically waste performance ordering the input data.
You want an order in a select, MAKE ONE by adding an order by clause to the select. The table's internal order is by clustered index, but an query can return results in any order it wants. Fundamental SQL issue, as I said in the first sentence. Any good book on sql covers that in one of the first chapters. SQL uses a set approach, sets have no intrinsic order.
Firstly T-SQL is a set based language and sets don't have orders. More over it also doesn't mean serial execution of commands i.e, the above query is not executed in sequence written but the processing order for a SELECT statement is as:
1.FROM
2.ON
3.JOIN
4.WHERE
5.GROUP BY
6.WITH CUBE or WITH ROLLUP
7.HAVING
8.SELECT
9.DISTINCT
10.ORDER BY
Now when you execute your query without into selected column data gets ordered based on the condition specified in 'Order By' clause but when Into is used format of new_table is determined by evaluating the expressions in the select list.(Remember order by clause has not been evaluated yet).
The columns in new_table are created in the order specified by the select list but rows cannot be ordered. It's a limitation of Into clause you can refer here:
Specifying an ORDER BY clause does not guarantee the rows are inserted
in the specified order.

how does SELECT TOP works when no order by is specified?

The msdn documentation says that when we write
SELECT TOP(N) ..... ORDER BY [COLUMN]
We get top(n) rows that are sorted by column (asc or desc depending on what we choose)
But if we don't specify any order by, msdn says random as Gail Erickson pointed out here. As he points out it should be unspecified rather then random. But as
Thomas Lee points out there that
When TOP is used in conjunction with the ORDER BY clause, the result
set is limited to the first N number of ordered rows; otherwise, it
returns the first N number of rows ramdom
So, I ran this query on a table that doesn't have any indexes, first I ran this..
SELECT *
FROM
sys.objects so
WHERE
so.object_id NOT IN (SELECT si.object_id
FROM
sys.index_columns si)
AND so.type_desc = N'USER_TABLE'
And then in one of those tables, (in fact I tried the query below in all of those tables returned by above query) and I always got the same rows.
SELECT TOP (2) *
FROM
MstConfigSettings
This always returned the same 2 rows, and same is true for all other tables returned by query 1. Now the execution plans shows 3 steps..
As you can see there is no index look up, it's just a pure table scan, and
The Top shows actual no of rows to be 2, and so does the Table Scan; Which is not the case (there I many rows).
But when I run something like
SELECT TOP (2) *
FROM
MstConfigSettings
ORDER BY
DefaultItemId
The execution plan shows
and
So, when I don't apply ORDER BY the steps are different (there is no sort). But the question is how does this TOP works when there is no Sort and why and how does it always gives the same result?
There is no guarantee which two rows you get. It will just be the first two retrieved from the table scan.
The TOP iterator in the execution plan will stop requesting rows once two have been returned.
Likely for a scan of a heap this will be the first two rows in allocation order but this is not guaranteed. For example SQL Server might use the advanced scanning feature which means that your scan will read pages recently read from another concurrent scan.

Query Optimisation (ORDER BY)

I have an SQL query similar to below:
SELECT NAME,
MY_FUNCTION(NAME) -- carries out some string manipulation
FROM TITLES
ORDER BY NAME; -- has an index.
The TITLES table has approximately 12,000 records. At the moment the query takes over 5 minutes to execute but if I remove the ORDER BY clause then it executes within a couple of seconds.
Does anyone have any suggestions on how to be speed up this query.
If MY_FUNCTION is deterministic (i.e. always returns the same result for the same input value) then you could create an index on (NAME, MY_FUNCTION(NAME)) and it may help (or may not!)
In comments under the question, you say that it takes 2 seconds "to return N rows without the ORDER BY". That makes sense: without the ORDER BY you will just get the first N rows encountered, as soon as they are encountered. With the ORDER BY, the first N rows are returned only after the results have been sorted into the correct order.
If the query is being used in a situation where getting the first N rows fast is important (e.g. an online report with pagination) then you could try adding a FIRST_ROWS or FIRST_ROWS_n hint to the query, to try to persuade it to use the index. See Choosing an Optimizer Goal
Use the EXPLAIN statement to see where the issue is
EXPLAIN SELECT NAME, MY_FUNCTION(NAME) FROM TITLES ORDER BY NAME;
Sounds weird. What's name column type?
Have you checked for defective hardware errors? Maybe (just maybe) your query with the order by clause is using your index, and your index is located in a defective disk (it could be in a different disk from the table if they are located in different tablespaces).

Does the order of the columns in a SELECT statement make a difference?

This question was inspired by a previous question posted on SO, "Does the order of the WHERE clause make a differnece?". Would it improve a SELECT statement's performance if the the columns used in the WHERE section are placed at the begining of the SELECT statement?
example:
SELECT customer.id,
transaction.id,
transaction.efective_date,
transaction.a,
[...]
FROM customer, transaction
WHERE customer.id = transaction.id;
I do know that limiting the list of columns to only the needed ones in a SELECT statement improves performance as opposed to using SELECT * because the current list is smaller.
For Oracle and Informix and any other self-respecting DBMS, the order of the columns should have no impact on performance. Similarly, it should be the case that the query engine finds the optimal order to process the Where clause so the order should not matter all things being equal (i.e., looking past constructs which might force an execution order).

FreeText Query is slow - includes TOP and Order By

The Product table has 700K records in it. The query:
SELECT TOP 1 ID,
Name
FROM Product
WHERE contains(Name, '"White Dress"')
ORDER BY DateMadeNew desc
takes about 1 minute to run. There is an non-clustered index on DateMadeNew and FreeText index on Name.
If I remove TOP 1 or Order By - it takes less then 1 second to run.
Here is the link to execution plan.
http://screencast.com/t/ZDczMzg5N
Looks like FullTextMatch has over 400K executions. Why is this happening? How can it be made faster?
UPDATE 5/3/2010
Looks like cardinality is out of whack on multi word FreeText searches:
Optimizer estimates that there are 28K records matching 'White Dress', while in reality there is only 1.
http://screencast.com/t/NjM3ZjE4NjAt
If I replace 'White Dress' with 'White', estimated number is '27,951', while actual number is '28,487' which is a lot better.
It seems like Optimizer is using only the first word in phrase being searched for cardinality.
Looks like FullTextMatch has over 400K executions. Why is this happening?
Since you have an index combined with TOP 1, optimizer thinks that it will be better to traverse the index, checking each record for the entry.
How can it be made faster?
If updating the statistics does not help, try adding a hint to your query:
SELECT TOP 1 *
FROM product pt
WHERE CONTAINS(name, '"test1"')
ORDER BY
datemadenew DESC
OPTION (HASH JOIN)
This will force the engine to use a HASH JOIN algorithm to join your table and the output of the fulltext query.
Fulltext query is regarded as a remote source returning the set of values indexed by KEY INDEX provided in the FULLTEXT INDEX definition.
Update:
If your ORM uses parametrized queries, you can create a plan guide.
Use Profiler to intercept the query that the ORM sends verbatim
Generate a correct plan in SSMS using hints and save it as XML
Use sp_create_plan_guide with an OPTION USE PLAN to force the optimizer always use this plan.
Edit
From http://technet.microsoft.com/en-us/library/cc721269.aspx#_Toc202506240
The most important thing is that the
correct join type is picked for
full-text query. Cardinality
estimation on the FulltextMatch STVF
is very important for the right plan.
So the first thing to check is the
FulltextMatch cardinality estimation.
This is the estimated number of hits
in the index for the full-text search
string. For example, in the query in
Figure 3 this should be close to the
number of documents containing the
term ‘word’. In most cases it should
be very accurate but if the estimate
was off by a long way, you could
generate bad plans. The estimation for
single terms is normally very good,
but estimating multiple terms such as
phrases or AND queries is more complex
since it is not possible to know what
the intersection of terms in the index
will be based on the frequency of the
terms in the index. If the cardinality
estimation is good, a bad plan
probably is caused by the query
optimizer cost model. The only way to
fix the plan issue is to use a query
hint to force a certain kind of join
or OPTIMIZE FOR.
So it simply cannot know from the information it stores whether the 2 search terms together are likely to be quite independent or commonly found together. Maybe you should have 2 separate procedures one for single word queries that you let the optimiser do its stuff on and one for multi word procedures that you force a "good enough" plan on (sys.dm_fts_index_keywords might help if you don't want a one size fits all plan).
NB: Your single word procedure would likely need the WITH RECOMPILE option looking at this bit of the article.
In SQL Server 2008 full-text search we have the ability to alter the plan that is generated based on a cardinality estimation of the search term used. If the query plan is fixed (as it is in a parameterized query inside a stored procedure), this step does not take place. Therefore, the compiled plan always serves this query, even if this plan is not ideal for a given search term.
Original Answer
Your new plan still looks pretty bad though. It looks like it is only returning 1 row from the full text query part but scanning all 770159 rows in the Product table.
How does this perform?
CREATE TABLE #tempResults
(
ID int primary key,
Name varchar(200),
DateMadeNew datetime
)
INSERT INTO #tempResults
SELECT
ID, Name, DateMadeNew
FROM Product
WHERE contains(Name, '"White Dress"')
SELECT TOP 1
*
FROM #tempResults
ORDER BY DateMadeNew desc
I can't see the linked execution plan, network police are blocking that, so this is just a guess...
if it is running fast without the TOP and ORDER BY, try doing this:
SELECT TOP 1
*
FROM (SELECT
ID, Name, DateMadeNew
FROM Product
WHERE contains(Name, '"White Dress"')
) dt
ORDER BY DateMadeNew desc
A couple of thoughts on this one:
1) Have you updated the statistics on the Product table? It would be useful to see the estimates and actual number of rows on the operations there too.
2) What version of SQL Server are you using? I had a similar issue with SQL Server 2008 that turned out to be nothing more than not having Service Pack 1 installed. Install SP1 and a FreeText query that was taking a couple of minutes (due to a huge number of actual executions against actual) went down to taking a second.
I had the same problem earlier.
The performance depends on which unique index you choose for full text indexing.
My table has two unique columns - ID and article_number.
The query:
select top 50 id, article_number, name, ...
from ARTICLE
CONTAINS(*,'"BLACK*" AND "WHITE*"')
ORDER BY ARTICLE_NUMBER
If the full text index is connected to ID then it is slow depending on the searched words.
If the full text index is connected to ARTICLE_NUMBER UNIQUE index then it was always fast.
I have better solution.
I. Let's first overview proposed solutions as they also may be used in some cases:
OPTION (HASH JOIN) - is not good as you may get error "Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN."
SELECT TOP 1 * FROM (ORIGINAL_SELECT) ORDER BY ... - is not good, when you need to use paginating results from you ORIGINAL_SELECT
sp_create_plan_guide - is not good, as to use plan_guide you have to save plan for specific sql statement, this won't work for dynamic sql statements (e.g. generated by ORM)
II. My Solution contains of two parts
1. Self join table used for Full Text search
2. Use MS SQL HASH Join Hints MSDN Join Hints
Your SQL :
SELECT TOP 1 ID, Name FROM Product WHERE contains(Name, '"White Dress"')
ORDER BY DateMadeNew desc
Should be rewritten as :
SELECT TOP 1 p.ID, p.Name FROM Product p INNER HASH JOIN Product fts ON fts.ID = p.ID
WHERE contains(fts.Name, '"White Dress"')
ORDER BY p.DateMadeNew desc
If you are using NHibernate with/without Castle Active Records, I've replied in post how to write interceptor to modify your query to replace INNER JOIN by INNER HASH JOIN