How accurate is Oracle's EXPLAIN PLAN? - sql

Are there any good ways to objectively measure a query's performance in Oracle 10g? There's one particular query that I've been tuning for a few days. I've gotten a version that seems to be running faster (at least based on my initial tests), but the EXPLAIN cost is roughly the same.
How likely is it that the EXPLAIN cost is missing something?
Are there any particular situations where the EXPLAIN cost is disproportionately different from the query's actual performance?
I used the first_rows hint on this query. Does this have an impact?

How likely is it that the EXPLAIN cost is missing something?
Very unlikely. In fact, it would be a level 1 bug :)
Actually, if your statistics have changed significantly from the time you ran the EXPLAIN, the actual query plan will differ. But as soom as the query is compliled, the plan will remain the same.
Note EXPLAIN PLAN may show you things that are likely to happen but may never happen in an actual query.
Like, if you run an EXPLAIN PLAN on a hierarchical query:
SELECT *
FROM table
START WITH
id = :startid
CONNECT BY
parent = PRIOR id
with indexes on both id and parent, you will see an extra FULL TABLE SCAN which most probably will not happen in real life.
Use STORED OUTLINE's to store and reuse the plan no matter what.
Are there any particular situations where the EXPLAIN cost is disproportionately different from the query's actual performance?
Yes, it happens very very often on complicate queries.
CBO (cost based optimizer) uses calculated statistics to evaluate query time and choose optimal plan.
If you have lots of JOIN's, subqueries and these kinds on things in your query, its algorithm cannot predict exactly which plan will be faster, especially when you hit memory limits.
Here's the particular situation you asked about: HASH JOIN, for instance, will need several passes over the probe table if the hash table will not fit into pga_aggregate_table, but as of Oracle 10g, I don't remember this ever to be taken into account by CBO.
That's why I hint every query I expect to run for more than 2 seconds in a worst case.
I used the first_rows hint on this query. Does this have an impact?
This hint will make the optimizer to use a plan which has lower response time: it will return first rows as soon as possible, despite the overall query time being larger.
Practically, it almost always means using NESTED LOOP's instead of HASH JOIN's.
NESTED LOOP's have poorer overall performance on large datasets, but they return the first rows faster (since no hash table needs to be built).
As for the query from your original question, see my answer here.

Q: Are there any good ways to objectively measure a query's performance in Oracle 10g?
Oracle tracing is the best way to measure performance. Execute the query and let Oracle instrument the execution. In the SQLPlus environment, it's very easy to use AUTOTRACE.
http://asktom.oracle.com/tkyte/article1/autotrace.html (article moved)
http://tkyte.blogspot.com/2007/04/when-explanation-doesn-sound-quite.html
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:5671636641855
And enabling Oracle trace in other environments isn't that difficult.
Q: There's one particular query that I've been tuning for a few days. I've gotten a version that seems to be running faster (at least based on my initial tests), but the EXPLAIN cost is roughly the same.
The actual execution of the statement is what needs to be measured. EXPLAIN PLAN does a decent job of predicting the optimizer plan, but it doesn't actually measure the performance.
Q:> 1 . How likely is it that the EXPLAIN cost is missing something?
Not very likely, but I have seen cases where EXPLAIN PLAN comes up with a different plan than the optimizer.
Q:> 2 . Are there any particular situations where the EXPLAIN cost is disproportionately different from the query's actual performance?
The short answer is that I've not observed any. But then again, there's not really a direct correlation between the EXPLAIN PLAN cost and the actual observed performance. It's possible for EXPLAIN PLAN to give a really high number for cost, but to have the actual query run in less than a second. EXPLAIN PLAN does not measure the actual performance of the query, for that you need Oracle trace.
Q:> 3 . I used the first_rows hint on this query. Does this have an impact?
Any hint (like /*+ FIRST_ROWS */) may influence which plan is selected by the optimizer.
The "cost" returned by the EXPLAIN PLAN is relative. It's an indication of performance, but not an accurate gauge of it. You can't translate a cost number into a number of disk operations or a number of CPU seconds or number of wait events.
Normally, we find that a statement with an EXPLAIN PLAN cost shown as 1 is going to run "very quickly", and a statement with an EXPLAIN PLAN cost on the order of five or six digits is going to take more time to run. But not always.
What the optimizer is doing is comparing a lot of possible execution plans (full table scan, using an index, nested loop join, etc.) The optimizer is assigning a number to each plan, then selecting the plan with the lowest number.
I have seen cases where the optimizer plan shown by EXPLAIN PLAN does NOT match the actual plan used when the statement is executed. I saw that a decade ago with Oracle8, particularly when the statement involved bind variables, rather than literals.
To get an actual cost for statement execution, turn on tracing for your statement.
The easiest way to do this is with SQLPlus AUTOTRACE.
[http://asktom.oracle.com/tkyte/article1/autotrace.html][4]
Outside the SQLPlus environment, you can turn on Oracle tracing:
alter session set timed_statistics = true;
alter session set tracefile_identifier = here_is_my_session;
alter session set events '10046 trace name context forever, level 12'
--alter session set events '10053 trace name context forever, level 1'
select /*-- your_statement_here --*/ ...
alter session set events '10046 trace name context off'
--alter session set events '10053 trace name context off'
This puts a trace file into the user_dump_dest directory on the server. The tracefile produced will have the statement plan AND all of the wait events. (The assigned tracefile identifier is included in the filename, and makes it easier to find your file in the udump directory)
select value from v$parameter where name like 'user_dump_dest'
If you don't have access to the tracefile, you're going to need to get help from the dba to get you access. (The dba can create a simple shell script that developers can run against a .trc file to run tkprof, and change the permissions on the trace file and on the tkprof output. You can also use the newer trcanlzr. There are Oracle metalink notes on both.

AFAIK, EXPLAIN is using some database statistics to calculate the cost, so it can definitely differ from the actual performance.

In my experience EXPLAIN has been accurate and beneficial. If it wasn't it might not be the useful tool it is. When was the last time you analyzed the tables? I have seen where the Explain plan was nearly the same before and after an analyze, but the analyze made a huge performance gain.

Related

In which order the SQL queries will execute

I have a query as given below
select
student_id, student_name, student_total
from
student
where
(student_name like %a% and student_total > 400)
or student_rank < 10
In SQL engine, how it will execute this query. whether the condition will check from right to left or left to right?
No.
You don't know, and you must not depend on the answer.
The execution planner is the one that handles preparing the actual step-by-step plan of executing the query. This will tend to change with indices, statistics and such. It might very well evaluate the like first on a table with 100 rows, but the student_rank < 10 first on a table with 10 million rows and an index on student_rank. And if the statistics are right and you have an index on student_total, it might filter based on student_total first, even though it's deep inside the filter expression tree. The answer can also change with new versions of the engine, and possibly even with upgrades and updates to the server (e.g. the amount of memory available, total network and CPU load, ...)
Why do you care? It's the DB engine's problem to solve. And given that you're doing a like '%something%', it will most likely put that as the last condition pretty much always - as long as there's an index it can use for student_rank.
The fact that there's no definite order of execution also has implications that might surprise you. For example, if you have a function that throws an exception / error if it's passed a value of null, doing (SomeColumn is not null and MyFunction(SomeColumn)) is not safe - it will still throw the exception / error for any row with a null value in SomeColumn.
Only the most primitive (barely-)SQL databases have any notion of a fixed order of execution. The thing you should focus on is making the SQL readable first and foremost. Performance tweaks must always be precisely documented, along with tests to replicate the intented behaviour etc., because they are extremely fragile. Before adding index hints, make sure your indices are properly maintained, with up-to-date statistics, low fragmentation, good coverage etc. etc. When the execution planner produces sub-optimal execution plans, it's almost always your fault (and very rarely, a subtle bug in the engine, or a known limitation) - either by trying performance tricks in the SQL, or by having no DBA taking care of the maintenance.

Oracle ORDERED hint cost vs speed

So, a few weeks ago, I asked about Oracle execution plan cost vs speed in relation to the FIRST_ROWS(n) hint. I've run into a similar issue, but this time around the ORDERED hint. When I use the hint, my execution time improves dramatically (upwards of 90%), but the EXPLAIN PLAN for the query reports an enormous cost increase. In this particular query, the cost goes from 1500 to 24000.
The query is paramterized for pagination, and joins 19 tables to get the data out. I'd post it here, but it is 585 lines long and is written for a vendor's messy, godawful schema. Unless you happened to be intimately familiar with the product this is used for, it wouldn't be much help to see it. However, I gathered the schema stats at 100% shortly before starting work on tuning the query, so the CBO is not working in the dark here.
I'll try to summarize what the query does. The query essentially returns objects and their children in the system, and is structured as a large subquery block joined directly to several tables. The first part returns object IDs and is paginated inside its query block, before the joins to other tables. Then, it is joined to several tables that contain child IDs.
I know that the CBO is not all knowing or infalible, but it really bothers me to see an execution plan this costly perform so well; it goes against a lot of what I've been taught. With the FIRST_ROWS hint, the solution was to provide a value n such that the optimizer could reliably generate the execution plan. Is there a similar kind of thing happening with the ORDERED hint for my query?
The reported cost is for the execution of the complete query, not just the first set of rows. (PostgreSQL does the costing slightly differently, in that it provides the cost for the initial return of rows and for the complete set).
For some plans the majority of the cost is incurred prior to returning the first rows (eg where a sort-merge is used), and for others the initial cost is very low but the cost per row is relatively high thereafter (eg. nested loop join).
So if you are optimising for the return of the first few rows and joining 19 tables you may get a very low cost for the return of the first 20 with a nested loop-based plan. However for of the complete set of rows the cost of that plan might be very much higher than others that are optimised for returning all rows at the expense of a delay in returning the first.
You should not rely on the execution cost to optimize a query. What matters is the execution time (and in some cases resource usages).
From the concept guide:
The cost is an estimated value proportional to the expected resource use needed to execute the statement with a particular plan.
When the estimation is off, most often it is because the statistics available to the optimizer are misleading. You can correct that by giving the optimizer more accurate statistics. Check that the statistics are up to date. If they are, you can gather additional statistics, for example by enabling dynamic statistic gathering of manually creating an histogram on a data-skewed column.
Another factor that can explain the disparity between relative cost and execution time is that the optimizer is built upon simple assumptions. For example:
Without an histogram, every value in a column is uniformly distributed
An equality operator will select 5% of the rows (without histogram or dynamic stats)
The data in each column is independent upon the data in every other column
Furthermore, for queries with bind variables, a single cost is computed for further executions (even if the bind value change, possibly modifying the cardinality of the query)
...
These assumptions are made so that the optimizer can return an execution cost that is a single figure (and not an interval). For most queries these approximation don't matter much and the result is good enough.
However, you may find that sometimes the situation is simply too complex for the optimizer and even gathering extra statistics doesn't help. In that case you'll have to manually optimize the query, either by adding hints yourself, by rewriting the query or by using Oracle tools (such as SQL profiles).
If Oracle could devise a way to accurately determine the execution cost, we would never need to optimize a query manually in the first place !

Oracle execution plan cost vs speed

When building and tuning a query in Oracle, speed is generally the main concern for the developer. However, in tuning a particular query, I recently tried the FIRST_ROWS and NO_CPU_COSTING hints and an execution plan was generated that is 80% faster than the previous plan in execution time, but at a 300% higher cost. There is very little I/O in the execution plan, and it appears that all the additional cost comes from a nested loop outer join between two views.
This query is paginated, so I will only ever need the first few hundred rows. The lack of significant I/O leads me to think that this query will not be cache-dependent, and at first glance it seems like the way to go. However, since I've never seen a query increase in speed and cost so much at the same time, I'm not sure what the drawbacks to using this query might be. Are there any?
This is pretty typical of a query with an equijoin that is optimised to use a hash join when the full data set is required, and a nested loop when only the first few rows are needed, or where a sort is used for an order by on the full date set where an index can be more efficiently used for a subset.
Of course if the optimiser is not aware that you are only going to use a subset of the rows then it is not giving the cost for the query that you will actually execute, as it includes the cost for all the nested loop operations that are never going to execute.
However, there is nothing incorrect about the estimated cost, it just is what it is. If you want a more meaningful figure for your own understanding then use a rownum limit.
By the way, FIRST_ROWS is deprecated in favour of first_rows(1), first_rows(10), first_rows(100) or first_rows(1000).

How to consider Explain plan as good- Oracle 10G

When an oracle explained plan is consider good?
I'm try to refactor a DB Schema, and there are so many query in view and packages that are so slow.
For example, this is one of the most orrible query, and give me this explain plan:
Plan
ALL_ROWSCost: 18,096 Bytes: 17 Cardinality: 1
I don't ask how to fix a query, just how to consider the explain plan as good. Thanks!!
Before considering the result of an Explain Plan we need to understand following terminologies,
Cardinality– Estimate of the number of rows coming out of each of the operations.
• Access method – The way in which the data is being accessed, via either a table scan or index
access.
• Join method – The method (e.g., hash, sort-merge, etc.) used to join tables with each other.
• Join type – The type of join (e.g., outer, anti, semi, etc.).
• Join order – The order in which the tables are joined to each other.
• Partition pruning – Are only the necessary partitions being accessed to answer the query?
• Parallel Execution – In case of parallel execution, is each operation in the plan being
conducted in parallel? Is the right data redistribution method being used?
By reviewing the four key
elements of: cardinality estimations, access methods, join methods, and join orders; you can determine if the execution plan is the best available plan.
This white paper will help you, http://www.oracle.com/technetwork/database/focus-areas/bi-datawarehousing/twp-explain-the-explain-plan-052011-393674.pdf
The cost estimate is oracles educated guess on how many blocks it will need to visit in order to answer your query. Is 18,096 a good number? That depends on what you are doing, how fast your server is and how quick you need it to run. There is little meaning in this number as an absolute value.
If you change the SQL or and indexes and the cost estimate goes down that is a good sign but what really matters is how long when it actually ruins. Oracle can estimate badly at times.
Having said all that it looks a bit high for something that runs while a user waits but reasonable for a batch job.

How to use Explain Plan to optimize queries?

I have been tasked to optimize some sql queries at work. Everything I have found points to using Explain Plan to identify problem areas. The problem I can not find out exactly what explain plan is telling me. You get Cost, Cardinality, and bytes.
What do this indicate, and how should I be using this as a guide. Are low numbers better? High better? Any input would be greatly appreciated.
Or if you have a better way to go about optimizing a query, I would be interested.
I also assume you are using Oracle. And I also recommend that you check out the explain plan web page, for starters. There is a lot to optimization, but it can be learned.
A few tips follow:
First, when somebody tasks you to optimize, they are almost always looking for acceptable performance rather than ultimate performance. If you can reduce a query's running time from 3 minutes down to 3 seconds, don't sweat reducing it down to 2 seconds, until you are asked to.
Second, do a quick check to make sure the queries you are optimizing are logically correct. It sounds absurd, but I can't tell you the number of times I've been asked for advice on a slow running query, only to find out that it was occasionally giving wrong answers! And as it turns out, debugging the query often turned out to speed it up as well.
In particular, look for the phrase "Cartesian Join" in the explain plan. If you see it there, the chances are awfully good that you've found an unintentional cartesian join. The usual pattern for an unintentional cartesian join is that the FROM clause lists tables separated by comma, and the join conditions are in the WHERE clause. Except that one of the join conditions is missing, so that Oracle has no choice but to perform a cartesian join. With large tables, this is a performance disaster.
It is possible to see a Cartesian Join in the explain plan where the query is logically correct, but I associate this with older versions of Oracle.
Also look for the unused compound index. If the first column of a compound index is not used in the query, Oracle may use the index inefficiently, or not at all. Let me give an example:
The query was:
select * from customers
where
State = #State
and ZipCode = #ZipCode
(The DBMS was not Oracle, so the syntax was different, and I've forgotten the original syntax).
A quick peek at the indexes revealed an index on Customers with the columns
(Country, State, ZipCode) in that order. I changed the query to read
select * from customers
where Country = #Country
and State = #State
and ZipCode = #ZipCode
and now it ran in about 6 seconds instead of about 6 minutes, because the optimizer was able to use the index to good advantage. I asked the application programmers why they had omitted the country from the criteria, and this was their answer: they knew that all the addresses had country equal to 'USA' so they figured they could speed up the query by leaving that criterion out!
Unfortunately, optimizing database retrieval is not really the same as shaving microseconds off of computing time. It involves understanding the database design, especially indexes, and at least an overview of how the optimizer does its job.
You generally get better results from the optimizer when you learn to collaborate with it instead of trying to outsmart it.
Good luck coming up to speed at optimization!
You get more than that actually depending on what you are doing. Check out this explain plan page. I'm assuming a little bit here that you are using Oracle and know how to run the script to display the plan output. What may be more important to start with is looking at the left hand side for the use of a particular index or not and how that index is being utilized. You should see things like "(Full)", "(By Index Rowid)", etc if you are doing joins. The cost would be the next thing to look at with lower costs being better and you will notice that if you are doing a join that is not using an index you may get a very large cost. You may also want to read details about the explain plan columns.
You got the fuzzy end of the lollipop.
There is absolutely no way, in isolation, without a ton of additional information and experience, to look at an explain plan and determine what (if anything) is causing less than optimum performance. If query tuning could be reduced to a 10 step process it would be done by an automated process. I was about to list all of the things you need to understand to be effective at this but that would be a very long list.
the only short answer I can think of... is look for steps in the plan that are going through way more bytes than you'd guess. Then think about how you can reduce that number... via an index or partitioning.
Seriously, get Jonathan's Lewis book on Cost Based Oracle Fundementals
Get Tom Kyte's book on Oracle database Architecture and rent a cabin in the woods for a few weeks.
This is a massive area of expertise (aka a black art).
The approach I generally take is:
Run the SQL statement in question,
Get the actual plan (look up dbms_xplan),
Compare the estimated number of rows (cardinality) vs actual number of rows. A big difference indicates a problem to be fixed (e.g. index, histogram)
Consider if you can create an index to speed part of the process (generally where you conceptually think the plan should go first). Try some indexes.
You need to understand the O() impacts of different indexes in the context of what you are asking the database. It helps you understand data structures like b-trees, hash tables etc. Then, create an index that might work and repeat the process.
If Oracle decides not to use your index, apply an INDEX() hint and look at the new plan. The cost will be greater than the plan it did choose - this is why it didn't pick your index. The hinted plan might lead to some insight about why your index is not good.