Oracle SQL when to use the full hint - sql

I understand that the full hint:
/*+ FULL(alias) */
will force the optimizer to perform a full table scan.
Are there any scenarios where this would be more useful than using an index besides the obvious one where the query selects the majority of the rows in the table?

Very wide topic.
From Oracle documentation "When using hints, in some cases, you might need to specify a full set of hints in order to ensure the optimal execution plan. For example, if you have a very complex query, which consists of many table joins, and if you specify only the INDEX hint for a given table, then the optimizer needs to determine the remaining access paths to be used, as well as the corresponding join methods. Therefore, even though you gave the INDEX hint, the optimizer might not necessarily use that hint, because the optimizer might have determined that the requested index cannot be used due to the join methods and access paths selected by the optimizer."
It avoids indexes and go for a FTS.
This has exceptions, like all general concepts. A full table scan can be less expensive than an index scan followed by table access by rowid - sometimes much less expensive.
An excerpt from:
Index blocks need to be read too, in addition to the table's blocks. If the index is large (a sizable percentage of the table's size), the pressure on the SGA (and associates latches) might be noticeable
If the sort order of the index doesn't match the way the data is stored in the actual table, the number of logical I/Os necessary to fulfill the query could be (potentially much) more than the number of LIOs to do the full table scan. (The index's clustering factor is one of the indicators you can look at to estimate this.)
Essentially, if scanning via your index makes the data block I/O requests jump all over your table, the overall cost is going to be higher than if you could do large sequential reads. An FTS is more likely to use multi-block direct-path reads, bypassing the SGA entirely, which is also potentially good - no "cache thrashing", less latching.
If you have a covering index, chances are that's going to beat a full table scan all the time. If not, it's going to depend on what percentage of actual blocks (data + index) will need to be processed (the index's selectivity for that query), and how well they're "physically sorted" with respect to each-other.
As to why the optimizer is picking the "wrong" path for you here: hard to tell. Stale statistics on the index or table could be an issue like always, the estimate calculated base on the LIKE might be off for certain patterns, instance parameters could favor indexes a bit too much, ... If this is the only query that's misbehaving, and your stats are up to date, using a /*+ full */ hint doesn't sound too bad.
Also read Stack documentation

Related

Can SQL index make search longer?

I heard this question during a job interview, and the interviewer said that yes. My question is why and could someone give an example of an index that makes search longer instead of shorter.
Yes, it can.
An additional index adds possible execution plans for a query if applicable. The Postgres query planner estimates costs for a variety of possible plans and the fastest estimate wins. Since those are estimates, actual query plans can always deviate. A chosen query plan using your new index can turn out to be slower than another plan without.
If your server is configured properly (cost and resource settings, current columns statistics, ...) this outcome is unlikely, but still possible. This can happen for almost every query. More likely for more complex queries. And some types of queries are notoriously hard to estimate.
Related:
Keep PostgreSQL from sometimes choosing a bad query plan
Also, indexes always add write cost, so if your database is write-heavy and the machine is already saturated, more indexes can bring overall performance down.
A trivial example would be on a table with very few rows.
An index search has to load the index into memory and then look up the original data. If a table has only a few rows, they probably fit onto one data page. So, a full table scan requires loading one page.
Any index search (on a cold cache) requires loading two data pages -- one for the index and one for the data. That can be (significantly) longer then just scanning the rows on a single page.
On a large table, if the "search" returns a significant proportion of the rows in the table, then an index search ends up fetching the rows in an order different from how they are stored. If the data pages do not fix in memory, then you have a situation called thrashing, which means that there is a high probability that each new row would be a cache miss.

How can I measure the cost of a database index?

Is there a good method for judging whether the costs of creating a database index in Postgres (slower INSERTS, time to build an index, time to re-index) are worth the performance gains (faster SELECTS)?
I am actually going to disagree with Hexist. PostgreSQL's planner is pretty good, and it supports good sequential access to table files based on physical order scans, so indexes are not necessarily going to help. Additionally there are many cases where the planner has to pick an index. Additionally you are already creating primary keys for unique constraints and primary keys.
I think one of the good default positions with PostgreSQL (MySQL btw is totally different!) is to wait until you need an index to add one and then only add the indexes you most clearly need. This is, however, just a starting point and it assumes either a lack of a general lack of experience in looking at query plans or a lack of understanding of where the application is likely to go. Having experience in these areas matters.
In general, where you have tables likely to span more than 10 pages (that's 40kb of data and headers), it's a good idea to foreign keys. These can be assumed tob e clearly needed. Small lookup tables spanning 1 page should never have non-unique indexes because these indexes are never going to be used for selects (no query plan beats a sequential scan over a single page).
Beyond that point you also need to look at data distribution. Indexing boolean columns is usually a bad idea and there are better ways to index things relating to boolean searches (partial indexes being a good example). Similarly indexing commonly used function output may seem like a good idea sometimes, but that isn't always the case. Consider:
CREATE INDEX gj_transdate_year_idx ON general_journal (extract('YEAR' FROM transdate));
This will not do much. However an index on transdate might be useful if paired with a sparse index scan via a recursive CTE.
Once the basic indexes are in place, then the question becomes what other indexes do you need to add. This is often better left to later use case review than it is designed in at first. It isn't uncommon for people to find that performance significantly benefits from having fewer indexes on PostgreSQL.
Another major thing to consider is what sort of indexes you create and these are often use-case specific. A b-tree index on an array record for example might make sense if ordinality is important to the domain, and if you are frequently searching based on initial elements, but if ordinality is unimportant, I would recommend a GIN index, because a btree will do very little good (of course that is an atomicity red flag, but sometimes that makes sense in Pg). Even when ordinality is important, sometimes you need GIN indexes anyway because you need to be able to do commutitive scans as if ordinality was not. This is true if using ip4r for example to store cidr blocks and using an EXCLUDE constraint to ensure that no block contains any other block (the actual scan requires using an overlap operator rather than a contain operator since you don't know which side of the operator the violation will be found on).
Again this is somewhat database-specific. On MySQL, Hexist's recommendations would be correct, for example. On PostgreSQL, though, it's good to watch for problems.
As far as measuring, the best tool is EXPLAIN ANALYZE
Generally speaking, unless you have a log or archive table where you wont be doing selects on very frequently (or it's ok if they take awhile to run), you should index on anything your select/update/deelete statements will be using in a where clause.
This however is not always as simple as it seems, as just because a column is used in a where clause and is indexed, doesn't mean the sql engine will be able to use the index. Using the EXPLAIN and EXPLAIN ANALYZE capabilities of postgresql you can examine what indexes were used in selects and help you figure out if having an index on a column will even help you.
This is generally true because without an index your select speed goes down from some O(log n) looking operation down to O(n), while your insert speed only improves from cO(log n) to dO(log n) where d is usually less than c, ie you may speed up your inserts a little by not having an index, but you're going to kill your select speed if they're not indexed, so it's almost always worth it to have an index on your data if you're going to be selecting against it.
Now, if you have some small table that you do a lot of inserts and updates on, and frequently remove all the entries, and only periodically do some selects, it could turn out to be faster to not have any indexes.. however that would be a fairly special case scenario, so you'd have to do some benchmarking and decide if it made sense in your specific case.
Nice question. I'd like to add a bit more what #hexist had already mentioned and to the info provided by #ypercube's link.
By design, database don't know in which part of the table it will find data that satisfies provided predicates. Therefore, DB will perform a full or sequential scan of all table's data, filtering needed rows.
Index is a special data structure, that for a given key can precisely specify in which rows of the table such values will be found. The main difference when index is involved:
there is a cost for the index scan itself, i.e. DB has to find a value in the index first;
there's an extra cost of reading specific data from the table itself.
Working with index will lead to a random IO pattern, compared to a sequential one used in the full scan. You can google for the comparison figures of random and sequential disk access, but it might differ up to an order of magnitude (random being slower of course).
Still, it's clear that in some cases Index access will be cheaper and in others Full scan should be preferred. This depends on how many rows (out of all) will be returned by the specified predicate, or it's selectivity:
if predicate will return a relatively small number of rows, say, less then 10% of total, then it seems valuable to pick those directly via Index. This is a typical case for Primary/Unique keys or queries like: I need address information for customer with internal number = XXX;
if predicate has no big impact on the selectivity, i.e. if 30% (or more) rows are returned, then it's cheaper to do a Full scan, 'cos sequential disk access will beat random and data will be delivered faster. All reports, covering big areas (like a month, or all customers) fall here;
if there's a need to obtain an ordered list of values and there's an index, then doing Index scan is the fastest option. This is a special case of #2, when you need report data ordered by some column;
if number of distinct values in the column is relatively small compared to a total number of values, then Index will be a good choice. This is a case called Loose Index Scan, and typical queries will be like: I need 20 most recent purchases for each of the top 5 categories by number of goods.
How DB decides what to do, Index or Full scan? This is a runtime decision and it is based on the statistics, so make sure to keep those up to date. In fact, numbers provided above have no real life value, you have to evaluate each query independently.
All this is a very rough description of what happens. I would very much recommended to look into How PostgreSQL Planner Uses Statistics, this best what I've seen on the subject.

Can Indices actually decrease SELECT performance?

After reading some stuff about indices on SQL Server and their performance advantages for selects and disadvantages for updates / inserts, i was wondering if badly used indices could actually also hurt performance for selects.
What conditions would have to be fulfilled to have an index decrease performance of a pure select query? Do such situations exist?
Thanks!
(although I always try to include code examples, i can't think of anything that would support this question...)
Yes, albeit very slightly - so slightly that it would be justified to also answer "No".
If you have an index which might be considered for a query, but is not useable, the optimizer will waste a short time pondering whether and how to use it (in rare cases with REALLY complicated indexes and views, and more frequently when index performance hints are wrong, you might end up choosing a suboptimal query plan).
Some cases would be:
a table without indexes
a table with a badly chosen index, which gets discarded
a table where TWO indexes exist, and for some reason (e.g. obsolete statistics), the existence of the second index makes the optimizer choose it, while it would have been more convenient to use the first.
a table where the existing index (usually also thanks to obsolete statistics) tricks the optimizer into reading from the index an amount of data comparable to what could have been, more efficiently, retrieved with a full table scan; to make things worse, the index is fragmented and hashed differently than the table. What was essentially a full table scan becomes a slowed down full table scan with lots of disk thrashing.
In the first two cases the query time is the same (and entails a full scan), but in the third, you also have to analyze and discard the index. In the fourth, unlikely but possible, case an execution time which is likely very large increases and becomes huge (update 2021-10-20: I have just done this to myself. Yay me).
Where an index is likelier to hurt you - where ALL indexes hurt you - is in inserts, deletes and updates. Then, any index not used by the update query, yet affected by same, will require a write to the index itself.
So you will want to have indexes, but as few as you can without sacrificing SELECT performances. Actually, you might decide against indexing for a rarely used SELECT query in order to avoid having the needed index constantly updated by all other UPDATE queries.
Edit: after reading Heinzi's answer, I'd also like to add that most DB servers have maintenance tools which analyze the tables and indexes (and sometimes query performance counters too), and properly update the hints of which Heinzi spoke. So it's also important to periodically "maintain" the database to keep the optimizer supplied with up-to-date information on which indexes to choose from.
Update (MySQL)
There is a very nifty MySQL analysis tool that can actually suggest improvements to the existing indexing (remove unused keys, add useful keys): common_schema. It's really worth a look.
Yes, but it's very unlikely and it should not influence your decision to use indexes.
Sometimes, the SQL Server query analyzer chooses the an execution plan that's not optimal. Since the number of possible execution plans is much larger than it might seem on first sight (a simple join of n tables already produces n! possible execution plans), SQL Server has to make an educated guess. It's in the nature of guesses that they are sometimes wrong.
It's a rare occurrence, but I've seen it happen a couple of times in the past years. In that case (and only in that case), a better plan would have been chosen if the index had not been there. However, removing the index is not the correct way to solve this problem, since the index usually exists for a reason. The correct way is to add a hint to this query (and only to this query), to help the optimizer choose the right plan.
Yes, indexes can hurt performance for SELECTs. It is important to understand how database engines operate. Data is stored on disk(s) in "pages". Indexes make it possible to access the specific page that has a specific value in one or more columns in the table.
This is great if you are looking for specific values.
However, consider a query that needs to look at every row in a table. If you go through the table, you read the pages in order and -- critically -- you get every row on the page with a single read. The number of reads is the number of pages in the table. In addition, the page cache can optimize the reads with look-ahead reads and pages no longer being used are simply overwritten.
Using an index for the same reads goes through the table one record at a time rather than one page at a time. This results in random reads through the pages. In the worst case, there is one read per record in the table -- potentially a very significant hit to performance. In addition, the index itself occupies some of the page cache, reducing memory for other operations.
In generally, the optimizer component of a SQL engine does a good job distinguishing between these two situations. One of the key metrics is the selectivity of the query. How many rows is the query returning (which the optimizer looks at with respect to the number of pages)? If the number of rows is about the same as the number of pages, the optimizer would consider a full table scan rather than an index scan.
There are definitely other considerations, but in general, an index can hurt performance of even a simple select query. In general, optimizers do a good job, but there are sometimes unusual cases that trick even the best optimizers.
My guess would be if you create indices that confuse the query plan optimiser, and that ends up choosing an inefficient index for the query at hand.
This is potentially implementation-dependent, but in principle indexes should not slow down SELECT.
Obviously they can slow down INSERT and UPDATE.

Are SQL Execution Plans based on Schema or Data or both?

I hope this question is not too obvious...I have already found lots of good information on interpreting execution plans but there is one question I haven't found the answer to.
Is the plan (and more specifically the relative CPU cost) based on the schema only, or also the actual data currently in the database?
I am try to do some analysis of where indexes are needed in my product's database, but am working with my own test system which does not have close to the amount of data a product in the field would have. I am seeing some odd things like the estimated CPU cost actually going slightly UP after adding an index, and am wondering if this is because my data set is so small.
I am using SQL Server 2005 and Management Studio to do the plans
It will be based on both Schema and Data. The Schema tells it what indexes are available, the Data tells it which is better.
The answer can change in small degrees depending on the DBMS you are using (you have not stated), but they all maintain statistics against indexes to know whether an index will help. If an index breaks 1000 rows into 900 distinct values, it is a good index to use. If an index only results in 3 different values for 1000 rows, it is not really selective so it is not very useful.
SQL Server is 100% cost-based optimizer. Other RDBMS optimizers are usually a mix of cost-based and rules-based, but SQL Server, for better or worse, is entirely cost driven. A rules based optimizer would be one that can say, for example, the order of the tables in the FROM clause determines the driving table in a join. There are no such rules in SQL Server. See SQL Statement Processing:
The SQL Server query optimizer is a
cost-based optimizer. Each possible
execution plan has an associated cost
in terms of the amount of computing
resources used. The query optimizer
must analyze the possible plans and
choose the one with the lowest
estimated cost. Some complex SELECT
statements have thousands of possible
execution plans. In these cases, the
query optimizer does not analyze all
possible combinations. Instead, it
uses complex algorithms to find an
execution plan that has a cost
reasonably close to the minimum
possible cost.
The SQL Server query optimizer does
not choose only the execution plan
with the lowest resource cost; it
chooses the plan that returns results
to the user with a reasonable cost in
resources and that returns the results
the fastest. For example, processing a
query in parallel typically uses more
resources than processing it serially,
but completes the query faster. The
SQL Server optimizer will use a
parallel execution plan to return
results if the load on the server will
not be adversely affected.
The query optimizer relies on
distribution statistics when it
estimates the resource costs of
different methods for extracting
information from a table or index.
Distribution statistics are kept for
columns and indexes. They indicate the
selectivity of the values in a
particular index or column. For
example, in a table representing cars,
many cars have the same manufacturer,
but each car has a unique vehicle
identification number (VIN). An index
on the VIN is more selective than an
index on the manufacturer. If the
index statistics are not current, the
query optimizer may not make the best
choice for the current state of the
table. For more information about
keeping index statistics current, see
Using Statistics to Improve Query
Performance.
Both schema and data.
It takes the statistics into account when building a query plan, using them to approximate the number of rows returned by each step in the query (as this can have an effect on the performance of different types of joins, etc).
A good example of this is the fact that it doesn't bother to use indexes on very small tables, as performing a table scan is faster in this situation.
I can't speak for all RDBMS systems, but Postgres specifically uses estimated table sizes as part of its efforts to construct query plans. As an example, if a table has two rows, it may choose a sequential table scan for the portion of the JOIN that uses that table, whereas if it has 10000+ rows, it may choose to use an index or hash scan (if either of those are available.) Incidentally, it used to be possible to trigger poor query plans in Postgres by joining VIEWs instead of actual tables, since there were no estimated sizes for VIEWs.
Part of how Postgres constructs its query plans depend on tunable parameters in its configuration file. More information on how Postgres constructs its query plans can be found on the Postgres website.
For SQL Server, there are many factors that contribute to the final execution plan. On a basic level, Statistics play a very large role but they are based on the data but not always all of the data. Statistics are also not always up to date. When creating or rebuilding an Index, the statistics should be based on a FULL / 100% sample of the data. However, the sample rate for automatic statistics refreshing is much lower than 100% so it is possible to sample a range that is in fact not representative of much of the data. Estimated number of rows for the operation also plays a role which can be based on the number of rows in the table or the statistics on a filtered operation. So out-of-date (or incomplete) Statistics can lead the optimizer to choose a less-than-optimal plan just as a few rows in a table can cause it to ignore indexes entirely (which can be more efficient).
As mentioned in another answer, the more unique (i.e. Selective) the data is the more useful the index will be. But keep in mind that the only guaranteed column to have statistics is the leading (or "left-most" or "first") column of the Index. SQL Server can, and does, collect statistics for other columns, even some not in any Indexes, but only if AutoCreateStatistics DB option is set (and it is by default).
Also, the existence of Foreign Keys can help the optimizer when those fields are in a query.
But one area not considered in the question is that of the Query itself. A query, slightly changed but still returning the same results, can have a radically different Execution Plan. It is also possible to invalidate the use of an Index by using:
LIKE '%' + field
or wrapping the field in a function, such as:
WHERE DATEADD(DAY, -1, field) < GETDATE()
Now, keep in mind that read operations are (ideally) faster with Indexes but DML operations (INSERT, UPDATE, and DELETE) are slower (taking more CPU and Disk I/O) as the Indexes need to be maintained.
Lastly, the "estimated" CPU, etc. values for cost are not always to be relied upon. A better test is to do:
SET STATISTICS IO ON
run query
SET STATISTICS IO OFF
and focus on "logical reads". If you reduce Logical Reads then you should be improving performance.
You will, in the end, need a set of data that comes somewhat close to what you have in Production in order to performance tune with regards to both Indexes and the Queries themselves.
Oracle specifics:
The stated cost is actually an estimated execution time, but it is given in a somewhat arcane unit of measure that has to do with estimated time for block reads. It's important to realize that the calculated cost doesn't say much about the runtime anyway, unless each and every estimate made by the optimizer was 100% perfect (which is never the case).
The optimizer uses the schema for a lot of things when deciding what transformations/heuristics can be applied to the query. Some examples of schema things that matter a lot when evaluating xplans:
Foreign key constraints (can be used for table elimiation)
Partitioning (exclude entire ranges of data)
Unique constraints (index unique vs range scans for example)
Not null constraints (anti-joins are not available with not in() on nullable columns
Data types (type conversions, specialized date arithmetics)
Materialized views (for rewriting a query against an aggregate)
Dimension Hierarchies (to determine functional dependencies)
Check constraints (the constraint is injected if it lowers cost)
Index types (b-tree(?), bitmap, joined, function based)
Column order in index (a = 1 on {a,b} = range scan, {b,a} = skip scan or FFS)
The core of the estimates comes from using the statistics gathered on actual data (or cooked). Statistics are gathered for tables, columns, indexes, partitions and probably something else too.
The following information is gathered:
Nr of rows in table/partition
Average row/col length (important for costing full scans, hash joins, sorts, temp tables)
Number of nulls in col (is_president = 'Y' is pretty much unique)
Distinct values in col (last_name is not very unique)
Min/max value in col (helps unbounded range conditions like date > x)
...to help estimate the nr of expected rows/bytes returned when filtering data. This information is used to determine what access paths and join mechanisms are available and suitable given the actual values from the SQL query compared to the statistics.
On top of all that, there is also the physical row order which affects how "good" or attractive an index become vs a full table scan. For indexes this is called "clustering factor" and is a measure of how much the row order matches the order of the index entries.

How do you interpret a query's explain plan?

When attempting to understand how a SQL statement is executing, it is sometimes recommended to look at the explain plan. What is the process one should go through in interpreting (making sense) of an explain plan? What should stand out as, "Oh, this is working splendidly?" versus "Oh no, that's not right."
I shudder whenever I see comments that full tablescans are bad and index access is good. Full table scans, index range scans, fast full index scans, nested loops, merge join, hash joins etc. are simply access mechanisms that must be understood by the analyst and combined with a knowledge of the database structure and the purpose of a query in order to reach any meaningful conclusion.
A full scan is simply the most efficient way of reading a large proportion of the blocks of a data segment (a table or a table (sub)partition), and, while it often can indicate a performance problem, that is only in the context of whether it is an efficient mechanism for achieving the goals of the query. Speaking as a data warehouse and BI guy, my number one warning flag for performance is an index based access method and a nested loop.
So, for the mechanism of how to read an explain plan the Oracle documentation is a good guide: http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/ex_plan.htm#PFGRF009
Have a good read through the Performance Tuning Guide also.
Also have a google for "cardinality feedback", a technique in which an explain plan can be used to compare the estimations of cardinality at various stages in a query with the actual cardinalities experienced during the execution. Wolfgang Breitling is the author of the method, I believe.
So, bottom line: understand the access mechanisms. Understand the database. Understand the intention of the query. Avoid rules of thumb.
This subject is too big to answer in a question like this. You should take some time to read Oracle's Performance Tuning Guide
The two examples below show a FULL scan and a FAST scan using an INDEX.
It's best to concentrate on your Cost and Cardinality. Looking at the examples the use of the index reduces the Cost of running the query.
It's a bit more complicated (and i don't have a 100% handle on it) but basically the Cost is a function of CPU and IO cost, and the Cardinality is the number of rows Oracle expects to parse. Reducing both of these is a good thing.
Don't forget that the Cost of a query can be influenced by your query and the Oracle optimiser model (eg: COST, CHOOSE etc) and how often you run your statistics.
Example 1:
SCAN http://docs.google.com/a/shanghainetwork.org/File?id=dd8xj6nh_7fj3cr8dx_b
Example 2 using Indexes:
INDEX http://docs.google.com/a/fukuoka-now.com/File?id=dd8xj6nh_9fhsqvxcp_b
And as already suggested, watch out for TABLE SCAN. You can generally avoid these.
Looking for things like sequential scans can be somewhat useful, but the reality is in the numbers... except when the numbers are just estimates! What is usually far more useful than looking at a query plan is looking at the actual execution. In Postgres, this is the difference between EXPLAIN and EXPLAIN ANALYZE. EXPLAIN ANALYZE actually executes the query, and gets real timing information for every node. That lets you see what's actually happening, instead of what the planner thinks will happen. Many times you'll find that a sequential scan isn't an issue at all, instead it's something else in the query.
The other key is identifying what the actual expensive step is. Many graphical tools will use different sized arrows to indicate how much different parts of the plan cost. In that case, just look for steps that have thin arrows coming in and a thick arrow leaving. If you're not using a GUI you'll need to eyeball the numbers and look for where they suddenly get much larger. With a little practice it becomes fairly easy to pick out the problem areas.
Really for issues like these, the best thing to do is ASKTOM. In particular his answer to that question contains links to the online Oracle doc, where a lot of the those sorts of rules are explained.
One thing to keep in mind, is that explain plans are really best guesses.
It would be a good idea to learn to use sqlplus, and experiment with the AUTOTRACE command. With some hard numbers, you can generally make better decisions.
But you should ASKTOM. He knows all about it :)
The output of the explain tells you how long each step has taken. The first thing is to find the steps that have taken a long time and understand what they mean. Things like a sequential scan tell you that you need better indexes - it is mostly a matter of research into your particular database and experience.
One "Oh no, that's not right" is often in the form of a table scan. Table scans don't utilize any special indexes and can contribute to purging of every useful in memory caches. In postgreSQL, for example, you will find it looks like this.
Seq Scan on my_table (cost=0.00..15558.92 rows=620092 width=78)
Sometimes table scans are ideal over, say, using an index to query the rows. However, this is one of those red-flag patterns that you seem to be looking for.
Basically, you take a look at each operation and see if the operations "make sense" given your knowledge of how it should be able to work.
For example, if you're joining two tables, A and B on their respective columns C and D (A.C=B.D), and your plan shows a clustered index scan (SQL Server term -- not sure of the oracle term) on table A, then a nested loop join to a series of clustered index seeks on table B, you might think there was a problem. In that scenario, you might expect the engine to do a pair of index scans (over the indexes on the joined columns) followed by a merge join. Further investigation might reveal bad statistics making the optimizer choose that join pattern, or an index that doesn't actually exist.
look at the percentage of time spent in each subsection of the plan, and consider what the engine is doing. for example, if it is scanning a table, consider putting an index on the field(s) that is is scanning for
I mainly look for index or table scans. This usually tells me I'm missing an index on an important column that's in the where statement or join statement.
From http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx:
If you see any of the following in an
execution plan, you should consider
them warning signs and investigate
them for potential performance
problems. Each of them are less than
ideal from a performance perspective.
* Index or table scans: May indicate a need for better or additional indexes.
* Bookmark Lookups: Consider changing the current clustered index,
consider using a covering index, limit
the number of columns in the SELECT
statement.
* Filter: Remove any functions in the WHERE clause, don't include wiews
in your Transact-SQL code, may need
additional indexes.
* Sort: Does the data really need to be sorted? Can an index be used to
avoid sorting? Can sorting be done at
the client more efficiently?
It is not always possible to avoid
these, but the more you can avoid
them, the faster query performance
will be.
Rules of Thumb
(you probably want to read up on the details too:
Oracle Docs
ASKTOM
SQL Server Docs
)
Bad
Table Scans of Several Large Tables
Good
Using a unique index
Index includes all required fields
Most Common Win
In about 90% of performance problems I have seen, the easiest win is to break up a query with lots (4 or more) of tables into 2 smaller queries and a temporary table.