I have been tasked to optimize some sql queries at work. Everything I have found points to using Explain Plan to identify problem areas. The problem I can not find out exactly what explain plan is telling me. You get Cost, Cardinality, and bytes.
What do this indicate, and how should I be using this as a guide. Are low numbers better? High better? Any input would be greatly appreciated.
Or if you have a better way to go about optimizing a query, I would be interested.
I also assume you are using Oracle. And I also recommend that you check out the explain plan web page, for starters. There is a lot to optimization, but it can be learned.
A few tips follow:
First, when somebody tasks you to optimize, they are almost always looking for acceptable performance rather than ultimate performance. If you can reduce a query's running time from 3 minutes down to 3 seconds, don't sweat reducing it down to 2 seconds, until you are asked to.
Second, do a quick check to make sure the queries you are optimizing are logically correct. It sounds absurd, but I can't tell you the number of times I've been asked for advice on a slow running query, only to find out that it was occasionally giving wrong answers! And as it turns out, debugging the query often turned out to speed it up as well.
In particular, look for the phrase "Cartesian Join" in the explain plan. If you see it there, the chances are awfully good that you've found an unintentional cartesian join. The usual pattern for an unintentional cartesian join is that the FROM clause lists tables separated by comma, and the join conditions are in the WHERE clause. Except that one of the join conditions is missing, so that Oracle has no choice but to perform a cartesian join. With large tables, this is a performance disaster.
It is possible to see a Cartesian Join in the explain plan where the query is logically correct, but I associate this with older versions of Oracle.
Also look for the unused compound index. If the first column of a compound index is not used in the query, Oracle may use the index inefficiently, or not at all. Let me give an example:
The query was:
select * from customers
where
State = #State
and ZipCode = #ZipCode
(The DBMS was not Oracle, so the syntax was different, and I've forgotten the original syntax).
A quick peek at the indexes revealed an index on Customers with the columns
(Country, State, ZipCode) in that order. I changed the query to read
select * from customers
where Country = #Country
and State = #State
and ZipCode = #ZipCode
and now it ran in about 6 seconds instead of about 6 minutes, because the optimizer was able to use the index to good advantage. I asked the application programmers why they had omitted the country from the criteria, and this was their answer: they knew that all the addresses had country equal to 'USA' so they figured they could speed up the query by leaving that criterion out!
Unfortunately, optimizing database retrieval is not really the same as shaving microseconds off of computing time. It involves understanding the database design, especially indexes, and at least an overview of how the optimizer does its job.
You generally get better results from the optimizer when you learn to collaborate with it instead of trying to outsmart it.
Good luck coming up to speed at optimization!
You get more than that actually depending on what you are doing. Check out this explain plan page. I'm assuming a little bit here that you are using Oracle and know how to run the script to display the plan output. What may be more important to start with is looking at the left hand side for the use of a particular index or not and how that index is being utilized. You should see things like "(Full)", "(By Index Rowid)", etc if you are doing joins. The cost would be the next thing to look at with lower costs being better and you will notice that if you are doing a join that is not using an index you may get a very large cost. You may also want to read details about the explain plan columns.
You got the fuzzy end of the lollipop.
There is absolutely no way, in isolation, without a ton of additional information and experience, to look at an explain plan and determine what (if anything) is causing less than optimum performance. If query tuning could be reduced to a 10 step process it would be done by an automated process. I was about to list all of the things you need to understand to be effective at this but that would be a very long list.
the only short answer I can think of... is look for steps in the plan that are going through way more bytes than you'd guess. Then think about how you can reduce that number... via an index or partitioning.
Seriously, get Jonathan's Lewis book on Cost Based Oracle Fundementals
Get Tom Kyte's book on Oracle database Architecture and rent a cabin in the woods for a few weeks.
This is a massive area of expertise (aka a black art).
The approach I generally take is:
Run the SQL statement in question,
Get the actual plan (look up dbms_xplan),
Compare the estimated number of rows (cardinality) vs actual number of rows. A big difference indicates a problem to be fixed (e.g. index, histogram)
Consider if you can create an index to speed part of the process (generally where you conceptually think the plan should go first). Try some indexes.
You need to understand the O() impacts of different indexes in the context of what you are asking the database. It helps you understand data structures like b-trees, hash tables etc. Then, create an index that might work and repeat the process.
If Oracle decides not to use your index, apply an INDEX() hint and look at the new plan. The cost will be greater than the plan it did choose - this is why it didn't pick your index. The hinted plan might lead to some insight about why your index is not good.
Related
I have a lot of records in table. When I execute the following query it takes a lot of time. How can I improve the performance?
SET ROWCOUNT 10
SELECT StxnID
,Sprovider.description as SProvider
,txnID
,Request
,Raw
,Status
,txnBal
,Stxn.CreatedBy
,Stxn.CreatedOn
,Stxn.ModifiedBy
,Stxn.ModifiedOn
,Stxn.isDeleted
FROM Stxn,Sprovider
WHERE Stxn.SproviderID = SProvider.Sproviderid
AND Stxn.SProviderid = ISNULL(#pSProviderID,Stxn.SProviderid)
AND Stxn.status = ISNULL(#pStatus,Stxn.status)
AND Stxn.CreatedOn BETWEEN ISNULL(#pStartDate,getdate()-1) and ISNULL(#pEndDate,getdate())
AND Stxn.CreatedBy = ISNULL(#pSellerId,Stxn.CreatedBy)
ORDER BY StxnID DESC
The stxn table has more than 100,000 records.
The query is run from a report viewer in asp.net c#.
This is my go-to article when I'm trying to do a search query that has several search conditions which might be optional.
http://www.sommarskog.se/dyn-search-2008.html
The biggest problem with your query is the column=ISNULL(#column, column) syntax. MSSQL won't use an index for that. Consider changing it to (column = #column AND #column IS NOT NULL)
You should consider using the execution plan and look for missing indexes. Also, how long it takes to execute? What is slow for you?
Maybe you could also not return so many rows, but that is just a guess. Actually we need to see your table and indexes plus the execution plan.
Check sql-tuning-tutorial
For one, use SELECT TOP () instead of SET ROWCOUNT - the optimizer will have a much better chance that way. Another suggestion is to use a proper inner join instead of potentially ending up with a cartesian product using the old style table,table join syntax (this is not the case here but it can happen much easier with the old syntax). Should be:
...
FROM Stxn INNER JOIN Sprovider
ON Stxn.SproviderID = SProvider.Sproviderid
...
And if you think 100K rows is a lot, or that this volume is a reason for slowness, you're sorely mistaken. Most likely you have really poor indexing strategies in place, possibly some parameter sniffing, possibly some implicit conversions... hard to tell without understanding the data types, indexes and seeing the plan.
There are a lot of things that could impact the performance of query. Although 100k records really isn't all that many.
Items to consider (in no particular order)
Hardware:
Is SQL Server memory constrained? In other words, does it have enough RAM to do its job? If it is swapping memory to disk, then this is a sure sign that you need an upgrade.
Is the machine disk constrained. In other words, are the drives fast enough to keep up with the queries you need to run? If it's memory constrained, then disk speed becomes a larger factor.
Is the machine processor constrained? For example, when you execute the query does the processor spike for long periods of time? Or, are there already lots of other queries running that are taking resources away from yours...
Database Structure:
Do you have indexes on the columns used in your where clause? If the tables do not have indexes then it will have to do a full scan of both tables to determine which records match.
Eliminate the ISNULL function calls. If this is a direct query, have the calling code validate the parameters and set default values before executing. If it is in a stored procedure, do the checks at the top of the s'proc. Unless you are executing this with RECOMPILE that does parameter sniffing, those functions will have to be evaluated for each row..
Network:
Is the network slow between you and the server? Depending on the amount of data pulled you could be pulling GB's of data across the wire. I'm not sure what is stored in the "raw" column. The first question you need to ask here is "how much data is going back to the client?" For example, if each record is 1MB+ in size, then you'll probably have disk and network constraints at play.
General:
I'm not sure what "slow" means in your question. Does it mean that the query is taking around 1 second to process or does it mean it's taking 5 minutes? Everything is relative here.
Basically, it is going to be impossible to give a hard answer without a lot of questions asked by you. All of these will bear out if you profile the queries, understand what and how much is going back to the client and watch the interactions amongst the various parts.
Finally depending on the amount of data going back to the client there might not be a way to improve performance short of hardware changes.
Make sure Stxn.SproviderID, Stxn.status, Stxn.CreatedOn, Stxn.CreatedBy, Stxn.StxnID and SProvider.Sproviderid all have indexes defined.
(NB -- you might not need all, but it can't hurt.)
I don't see much that can be done on the query itself, but I can see things being done on the schema :
Create an index / PK on Stxn.SproviderID
Create an index / PK on SProvider.Sproviderid
Create indexes on status, CreatedOn, CreatedBy, StxnID
Something to consider: When ROWCOUNT or TOP are used with an ORDER BY clause, the entire result set is created and sorted first and then the top 10 results are returned.
How does this run without the Order By clause?
SELECT
/*+ INDEX(ID_BL_REF_NO REF_number_BL_idx*/ DECODE(BL_TYPE,'E',BL_ORIGIN_NAME,'I',BL_FINAL_NAME) FROM_PORT,
DECODE(BL_TYPE,'I',BL_ORIGIN_NAME,'E',BL_FINAL_NAME) TO_PORT,
(BL_VESSEL_CONNECT||'/'||BL_VOYAGE_CONNECT||'/'||BL_PORT_CONNECT) Mother_vessel_voyage_port,
SUM(BLC_SIZE) No_of_20s,
SUM(BLC_SIZE) No_of_40s,
SUM(DECODE(BLC_SIZE,'20',1,'40',2)) Teus,
SUM(BLC_GROSSWT) GrossWt,
round((BLC_GROSSWT/SUM(DECODE(BLC_SIZE,'20',1,'40',2))),2) AverageWt,
SUM(DECODE(BLF_MODE,'P',BLF_LOCAL_AMOUNT)) PREPAID,
SUM(DECODE(BLF_MODE,'C',BLF_LOCAL_AMOUNT)) COLLECT,
SUM(DECODE(BLF_MODE,'E',BLF_LOCAL_AMOUNT)) ELSEWHERE,
(SUM(DECODE(BLF_MODE,'P',BLF_LOCAL_AMOUNT)+DECODE(BLF_MODE,'C',BLF_LOCAL_AMOUNT)+DECODE(BLF_MODE,'E',BLF_LOCAL_AMOUNT))/SUM(DECODE(BLC_SIZE,'20',1,'40',2))) AVERAGE
FROM ID_BL_DETAILS,id_bl_containers,ID_BL_FREIGHT
WHERE BL_REFNO=BLC_REFNO
AND BLF_REFNO=BLC_REFNO
GROUP BY BL_VESSEL_CONNECT,BL_VOYAGE_CONNECT,BL_PORT_CONNECT,BL_ORIGIN_NAME,BL_LODPORT,BL_DISPORT,BL_FINAL_NAME,BLC_GROSSWT,BL_TYPE
Your WHERE clause contains only joins. There are no filters. This means your query needs to consider all the rows in at least one table. From this it follows that your query should execute a FULL TABLE SCAN of at least one of your tables, not an indexed read. A full table scan is the most efficient way of getting all the rows in a table.
So don't fix the syntax of your INDEX hint, get rid of it.
Next, figure out which table should drive your query. This is business logic. Probably your requirement is something like
"Summarise BL_DETAILS and BL_FREIGHT
for every row in BL_CONTAINERS."
In which case you might think you need a full table scan of BL_CONTAINERS. But if BL_FREIGHT has more rows than BL_CONTAINERS and every BLF_REF_NO matches a BL_REF_NO (i.e. there is a foreign key on BL_FREIGHT.BLF_REF_NO referencing BL_CONTAINERS.BL_REF_NO) it would probably be better to drive from BL_FREIGHT.
Note that this is true if you are only interested in BL_CONTAINERS which have matching BL_FREIGHT rows. But if you want to include containers which have not been used (i.e. they have no matching no BL_FREIGHT records) you need to use outer joins and drive off the BL_CONTAINERS table.
These considerations get more complicated when you throw BL_DETAILS into the mix. Your report seems to be based around the BL_DETAILS categories (as Jeffrey observes it is hard for us to understand your query without aliases or describes). So perhaps BL_DETAILS is the right candidate for driving table.
As you can see, tuning requires insight into the business logic and the details of the data model. You have that local knowledge, we do not.
There are tools which can help you. Oracle has EXPLAIN PLAN which will show you how the database will execute the query. The query optimizer gets better with each release so it matters which version of the database you're using. Here is the documentation for 10g.
The important thing to note is that you need to give the database accurate statistics, in order for it to come up with a good plan. Find out more.
run explain on your query and make sure the proper indexes are set up. Will improve the speed of the query
http://www.sql.org/sql-database/postgresql/manual/sql-explain.html
Your question states that the query takes 9.968 seconds and you want it to be 0.5 seconds or less. This can only be done effectively (if possible at all) when you know where those 9.968 seconds are spent on. And to know where time in a query is being spent, you'll not only want to explain the statement, you'll want to trace an execution of that query. The latter will give you a breakdown of how the time in your query is spent.
There are two threads on OTN that describe how you can do that.
If you want to do the bare minimum, please follow this one:
http://forums.oracle.com/forums/thread.jspa?messageID=1812597
And if you want to give full details, please follow this one:
http://forums.oracle.com/forums/thread.jspa?threadID=863295
Happy tracing!
Regards,
Rob.
I am not so proficient in TSql as of now (writing since last 4/5 months) but I have written many queries. Although I have given the outputs, sometimes I feel that the queries are not so optimized. I searched in google and found lot of stuffs about query optimization, and they ask to look into the query plan(actual & estimated) for the performance improvisation.
As I already said that I am very new to writing queries so it is becoming difficult for me to grasp those solutions. But I need to learn query optimization.
Can any body help me out initially how and where should I start from?
Searching in internet reveals that, SEEK is better than SCAN(May it be index or Table). How can I achieve a seek over a scan?
Then they says that ORDER BY clause i.e. sorting is more costly. Then what is the work around? How can I write effective query?
Can anybody explain me, with some examples, which kind of query is better over what and in what situation?
Edited
Dear All,
You all have answered and that will help me a lot. But what I intend to say is that, you all have practised a lot for becoming an expert. Once upon a time, I guess you all were like what I am now.So my humble request is how you all started for writing optimised query.I know that patience is needed and I will devote that.
I apologise for any wrong statement of mine.
Thanks in advance
Articles discussing Query Optimization issues are often very factual and useful, but as you found out they can be hard to follow. It is a bit like when someone is trying to learn the basics rules of baseball, and all the sports commentary he/she finds on the subject is rife with acronyms and strategic details about the benefits of sacrificing someone at bat, and other "inside baseball" trivia...
So you need to learn the basics first:
the structure(s) of the database storage
indexes' structure, the clustered and non clustered kind, the multi column indexes
the concept of covering a query
the selectivity of a particular column
the disadvantage of indexes when it comes to CRUD operations
the basic subtasks/strategies of a query: table or index scan, index seek, sorting, inner-outer merge etc.
the log file, the data recovery model.
The following links apply to MS SQL Server. If that is not the DBMS you are using you can try and find similar material for the system of your choice. In fact, so long as you realize that the implementation may vary, it may be useful to peruse the MS documention.
MS SQL storage structures
MS SQL pages and extents
Then as you started doing, learn the way to read query plans (even if not in fully understand at first), and all this should bring you to a level where you start to make sense of the more advanced books or articles on the topic. I do not know of tutorials for Query Plans on the Internet (though I'm quite sure they exist...), but the following methodology may be of use: Start with simple queries, review the query plan (if possible in a graphic fashion), start recognizing the most common elements: Table Scan, Index Seek, Sort, nested loops... Read the detailed properties of these instances: estimated nb of rows, cost percentage etc. When you find a new element that you do not know/understand, use this keyword to find details on the internet. Also: experiment a lot.
Finally you should remember that while the way the query is written and the set of indexes etc. provided cover a great part of optimization needs, there are other sources of optmization, for example the way hardware is put to use (a basic example is how by having the data file and the log file on separate physical disks, we can greatly improve CRUD performance).
Searching in internet reveals that,
SEEK is better than SCAN(May it be
index or Table). How can I achieve a
seek over a scan?
Add the necessary index -- if the incremental costs on INSERT and UPDATE (and extra storage) are an overall win to speed up the seeking in your queries.
Then they says that ORDER BY clause
i.e. sorting is more costly. Then what
is the work around? How can I write
effective query?
Add the necessary index -- if the incremental costs on INSERT and UPDATE (and extra storage) are an overall win to speed up the ordering in your queries.
Can anybody explain me, with some
examples, which kind of query is
better over what and in what
situation?
You already pointed out a couple of specific questions -- and the answers were nearly identical. What good would it do to add another six?
Run benchmark queries over representative artificial data sets (must resemble what you're planning to have in production -- if you have small toy-sized tables the query plans will not be representative nor meaningful), try with and without the index that appear to be suggested by the different query plans, measure performance; rinse, repeat.
It takes 10,000 hours of practice to be good at anything. Optimizing DB schemas, indices, queries, etc, is no exception;-).
ORDER BY is a necessary evil - there's no way around it.
Refer to this question for solving index seek, scan and bookmark/key lookups. And this site is very good for optimization techniques...
Always ensure that you have indexes on your tables. Not too many and not too few.
Using sql server 2005, apply included columns in these indexes, they help for lookups.
Order by is costly, if not required, why sort a data table if it is not required.
Always filter as early as possible, if you reduce the number of joins, function calls etc, as early as possible, you reduce time taken over all
avoid cursors if you can
use temp tables/ table vars for
filtering where possible
remote queries will cost you
queries with sub
selects in the where clause can be
hurtfull
table functions can be costly if not
filtered
as always, there is no hard rule, and things should be taken on a per query basis.
Always create the query as understandle/readable as possible, and optimize when needed.
EDIT to comment question:
Temp tables can be used when you require to add indexes on the temp table (you cannot add indexes on var tables, except the pk). I mostly use var tables when i can, and only have the required fields in them as such
DECLARE #Table TABLE(
FundID PRIMARY KEY
)
i would use this to fill my fund group ids instead of having a join to tables that are less optimized.
I read a couple of articles the other day and to my surprise found that var tables are actually created in the tempdb
link text
Also, i have heard, and found that table UDFs can seems like a "black box" to the query planner. Once again, we tend to move the selects from the table functions into table vars, and then join on these var tables. But as mentioned earlier, write the code first, then optimize when you find bottle necks.
I have found that CTEs can be usefull, but also, that when the level of recursion grows, that it can be very slow...
I have a query that has 7 inner joins (because a lot of the information is distributed in other tables), a few coworkers have been surprised. I was wondering if they should be surprised or is having 7 inner joins normal?
it's not unheard of, but I would place it into a view for ease of use, and maintenance
Two questions:
Does it work?
Can you explain it?
If so, then seven is fine. If you can't explain the query, then seven is too many.
It's normal if your schema is in the 5th normal form : )
Depending on what you are trying to accomplish, a large number of joins in a query is not remarkable.
Personally, I would be less concerned with the number of joins employed to return a desired result set and more concerned with whether the query is optimized and running within acceptable parameters.
If the query is fully optimized and cannot be trimmed down but the query itself does not execute quickly enough then it is possible that the data structure design is not the right fit with what you're trying to do. At which point you can re-evaluate what you're trying to accomplish or the structure of the data that is feeding your business model.
It's not at all unusual. With a system like Siebel it's common to see join counts in double figures.
Seven joins makes it tougher for readability, but more important are performance and scalability. If those are OK, go for it.
It's probably not normal but it's certainly not excessive. If you find yourself joining the same tables over and over, create some views.
number of joins depends on your data model, 7 joins can be in your query if that is what you query for - I recall having similar queries in an app I worked on long time ago, the performance depends on many factors (table size, indexes, server load, server config to name few) and I do not think it can be generalized that 7 joins are bad.
if it works for you then I guess its fine :D
Yes, it's normal - but, really, it's not such a great idea from a performance perspective. Since query plans are built on estimated costs, there is an increase in the number of errors as you increase joins (or any other operator, for that matter):
The SQL Server Query Optimizer will estimate a minimum of one row coming out of a seek operator. This is done to avoid the case when a very expensive subtree is picked due to an cardinality underestimation. If the subtree is estimated to return zero rows, many plans cost about the same and there can be errors in plan selection as a result. So, you’ll notice that the estimation is “high” for this case, and some errors could result. You also might notice that we estimate 20 executions of this branch instead of the actual 10. However, given the number of joins that have been evaluated before this operator, being off by a factor of 2 (10 rows) isn’t considered to be too bad. (Errors can increase exponentially with the number of joins).
Also, the optimizer attempts to balance the time required to come up with a plan versus the potential savings - it won't spend all day trying to find the most optimal plan. The more joins, the greater the number of alternative plans exist - some of which may be more optimal than the optimizer has time to find.
It's certainly not unusual. However at least in Oracle, 7 is a special number, as any more than that and the optimizer can no longer test every join order (due to factorial growth in the number of possibilities). So it would be wise to avoid going over 7 unless you're prepared to babysit your execution plan.
7 or even more is not at all unusual in data warehouses where a fact table could easily have foreign keys to a dozen dimensions. In the data warehouse scenario, the cardinality of the dimensions is usually low compared to the fact table, so filters on the dimensions help the fact table be utilized through an index seek or scan.
For a normalized transactional schema, it is not usually a problem if the cardinality of the results set is low in the primary base table (i.e. select everything about one customer), because the foreign keys can normally simply result in index seeks or index scans.
7 is fine if your database design requires it. However, if 7 is neccessary to achieve your goal, I'd reexamine the database design to make sure this level of obscurity is really neccessary.
Out of curiosity, is this DB2? Just a pattern I've noticed :)
is this 7 inner joins on the same table, 7 inner joins on different tables, or 7 nested inner joins?
...trick question! It really doesn't matter, if that is what your database structure requires, then it is correct
caveat: if it is 7 nested inner joins on the same table, you probably have a poorly-structured table ;-)
I think what you want to avoid is a join depth greater than 7. 7 inner joins of less than 7 joins in depth certainly isn't unheard of, but sometimes people hear "7 joins" and think the no-no is 7 joins, not depth.
When attempting to understand how a SQL statement is executing, it is sometimes recommended to look at the explain plan. What is the process one should go through in interpreting (making sense) of an explain plan? What should stand out as, "Oh, this is working splendidly?" versus "Oh no, that's not right."
I shudder whenever I see comments that full tablescans are bad and index access is good. Full table scans, index range scans, fast full index scans, nested loops, merge join, hash joins etc. are simply access mechanisms that must be understood by the analyst and combined with a knowledge of the database structure and the purpose of a query in order to reach any meaningful conclusion.
A full scan is simply the most efficient way of reading a large proportion of the blocks of a data segment (a table or a table (sub)partition), and, while it often can indicate a performance problem, that is only in the context of whether it is an efficient mechanism for achieving the goals of the query. Speaking as a data warehouse and BI guy, my number one warning flag for performance is an index based access method and a nested loop.
So, for the mechanism of how to read an explain plan the Oracle documentation is a good guide: http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/ex_plan.htm#PFGRF009
Have a good read through the Performance Tuning Guide also.
Also have a google for "cardinality feedback", a technique in which an explain plan can be used to compare the estimations of cardinality at various stages in a query with the actual cardinalities experienced during the execution. Wolfgang Breitling is the author of the method, I believe.
So, bottom line: understand the access mechanisms. Understand the database. Understand the intention of the query. Avoid rules of thumb.
This subject is too big to answer in a question like this. You should take some time to read Oracle's Performance Tuning Guide
The two examples below show a FULL scan and a FAST scan using an INDEX.
It's best to concentrate on your Cost and Cardinality. Looking at the examples the use of the index reduces the Cost of running the query.
It's a bit more complicated (and i don't have a 100% handle on it) but basically the Cost is a function of CPU and IO cost, and the Cardinality is the number of rows Oracle expects to parse. Reducing both of these is a good thing.
Don't forget that the Cost of a query can be influenced by your query and the Oracle optimiser model (eg: COST, CHOOSE etc) and how often you run your statistics.
Example 1:
SCAN http://docs.google.com/a/shanghainetwork.org/File?id=dd8xj6nh_7fj3cr8dx_b
Example 2 using Indexes:
INDEX http://docs.google.com/a/fukuoka-now.com/File?id=dd8xj6nh_9fhsqvxcp_b
And as already suggested, watch out for TABLE SCAN. You can generally avoid these.
Looking for things like sequential scans can be somewhat useful, but the reality is in the numbers... except when the numbers are just estimates! What is usually far more useful than looking at a query plan is looking at the actual execution. In Postgres, this is the difference between EXPLAIN and EXPLAIN ANALYZE. EXPLAIN ANALYZE actually executes the query, and gets real timing information for every node. That lets you see what's actually happening, instead of what the planner thinks will happen. Many times you'll find that a sequential scan isn't an issue at all, instead it's something else in the query.
The other key is identifying what the actual expensive step is. Many graphical tools will use different sized arrows to indicate how much different parts of the plan cost. In that case, just look for steps that have thin arrows coming in and a thick arrow leaving. If you're not using a GUI you'll need to eyeball the numbers and look for where they suddenly get much larger. With a little practice it becomes fairly easy to pick out the problem areas.
Really for issues like these, the best thing to do is ASKTOM. In particular his answer to that question contains links to the online Oracle doc, where a lot of the those sorts of rules are explained.
One thing to keep in mind, is that explain plans are really best guesses.
It would be a good idea to learn to use sqlplus, and experiment with the AUTOTRACE command. With some hard numbers, you can generally make better decisions.
But you should ASKTOM. He knows all about it :)
The output of the explain tells you how long each step has taken. The first thing is to find the steps that have taken a long time and understand what they mean. Things like a sequential scan tell you that you need better indexes - it is mostly a matter of research into your particular database and experience.
One "Oh no, that's not right" is often in the form of a table scan. Table scans don't utilize any special indexes and can contribute to purging of every useful in memory caches. In postgreSQL, for example, you will find it looks like this.
Seq Scan on my_table (cost=0.00..15558.92 rows=620092 width=78)
Sometimes table scans are ideal over, say, using an index to query the rows. However, this is one of those red-flag patterns that you seem to be looking for.
Basically, you take a look at each operation and see if the operations "make sense" given your knowledge of how it should be able to work.
For example, if you're joining two tables, A and B on their respective columns C and D (A.C=B.D), and your plan shows a clustered index scan (SQL Server term -- not sure of the oracle term) on table A, then a nested loop join to a series of clustered index seeks on table B, you might think there was a problem. In that scenario, you might expect the engine to do a pair of index scans (over the indexes on the joined columns) followed by a merge join. Further investigation might reveal bad statistics making the optimizer choose that join pattern, or an index that doesn't actually exist.
look at the percentage of time spent in each subsection of the plan, and consider what the engine is doing. for example, if it is scanning a table, consider putting an index on the field(s) that is is scanning for
I mainly look for index or table scans. This usually tells me I'm missing an index on an important column that's in the where statement or join statement.
From http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx:
If you see any of the following in an
execution plan, you should consider
them warning signs and investigate
them for potential performance
problems. Each of them are less than
ideal from a performance perspective.
* Index or table scans: May indicate a need for better or additional indexes.
* Bookmark Lookups: Consider changing the current clustered index,
consider using a covering index, limit
the number of columns in the SELECT
statement.
* Filter: Remove any functions in the WHERE clause, don't include wiews
in your Transact-SQL code, may need
additional indexes.
* Sort: Does the data really need to be sorted? Can an index be used to
avoid sorting? Can sorting be done at
the client more efficiently?
It is not always possible to avoid
these, but the more you can avoid
them, the faster query performance
will be.
Rules of Thumb
(you probably want to read up on the details too:
Oracle Docs
ASKTOM
SQL Server Docs
)
Bad
Table Scans of Several Large Tables
Good
Using a unique index
Index includes all required fields
Most Common Win
In about 90% of performance problems I have seen, the easiest win is to break up a query with lots (4 or more) of tables into 2 smaller queries and a temporary table.