SQL Server Query Optimizer Execution Plan ""Reason for Early Termination of Statement: TimeOut" - sql

Environment:
Windows 2008 R2, SQL Server 2008 SP1
Problem:
Table inner Join Query with 9 tables gets "Reason for Early Termination of Statement: TimeOut" in Actual Execution Plan.
Wrong Memory Grant is assigned.
Sorting was done in tempdb instead of ram even there is plenty of ram available.
Performance is slow when sorting is done in tempdb.
To Consider:
If I Drop, Create, and Drop one of the index from one of the 9 tables, query will get full optimization again and sorting is done in ram with full performance. However, as soon as I modify the query a little bit, it would lose the full optimization again.
Question:
Has anyone seen that problem before?
Is there any way to get more information on why the "timeout" happening?
What is the Optimizer really doing?
I would consider making a ramdisk for tempdb if there is no solution how to ratify the query optimizer problem. What are the risks to use ramdisk for tempdb?
Things I have tried already:
UPDATE STATS ON {ALL 9 Tables} WITH FULLSCAN
INDEX REBUILD works temporarily like the DROP CREATE DROP action I did above.

It could be that your indexes are getting fragmented. This can happen if you are inserting/deleting/updating the 9 tables a lot.
Example te rebuild an index:
ALTER INDEX MyIndex ON MyTable REBUILD
After some time the inserts/deletes/updates will fragment your indexes again. You can schedule the rebuild to keep performance from degrading. For example see:
http://technet.microsoft.com/en-us/library/ms180074(v=sql.100).aspx

Here is an article that discusses this: https://blogs.msdn.microsoft.com/psssql/2018/10/19/understanding-optimizer-timeout-and-how-complex-queries-can-be-affected-in-sql-server/
What are the Symptoms?
Here are some of the factors involved: You have a complex query that
involves lots of joined tables (for example, 8 or more tables are
joined).
Note that having lots of tables joined together means there are many possible join orders for SQL to consider.
One of the possible solutions is to break up into intermediate queries with temp tables; another is to determine the optimal join order yourself and use hint FORCEORDER.

Related

SQL Server In memory tables using tempdb

I'm testing in memory tables on SQL 2014 and the execution plan for a particular query is showing a sort in tempdb.
The query has a join between two in memory tables (on a field that has a nonclustered index in both tables) and a group by with a few sums and counts - the server has plenty of RAM available.
Why is it the query being sorted in tempdb if the tables are"In Memory"?
I'm also wondering why is SQL Suggesting to create an index if the "create index" statement is not allowed on in memory tables.
This query is being run in interop mode. This uses the normal query processor except that it pulls data out of Hekaton tables. Otherwise there are no changes that I can think of.
The speedup that you obtain from Hekaton here is extremely limited if any.
You cannot use the create index statement but you can create indexes at table creation time.
The data may be stored in memory, but the query processor can still only use the memory it's been given in it's memory grant to actually execute the query, if the memory required by the sort operator exceeds this because of a poor estimate by the optimizer, it will spill the same way as a query that uses disk based tables will.

Oracle LEADING hint -- why is this required?

Suddenly (but unfortunately I don't know when "suddenly" was; I know it ran fine at some point in the past) one of my queries started taking 7+ seconds instead of milliseconds to execute. I have 1 local table and 3 tables being accessed via a DB link. The 3 remote tables are joined together, and one of them is joined with my local table.
The local table's where clause only takes a few millis to execute on its own, and only returns a few (10's or 100's at the most) records. The 3 remote tables have many hundreds of thousands, possibly millions, of records between them, and if I join them appropriately I get tens or hundreds of thousands of records.
I am only joining with the remote tables so that I can pull out a few pieces of data related to each record in my local table.
What appears to be happening, however, is that Oracle joins the remote tables together first and then my local table to that mess at the end. This is always going to be a bad idea, especially given the data set that exists right now, so I added a /*+ LEADING(local_tab remote_tab_1) */ hint to my query and it now returns in milliseconds.
I compared the explain plans and they are almost identical, save for a single BUFFER SORT on one of the remote tables.
I'm wondering what might cause Oracle to approach this the wrong way? Is it an index issue? What should I be looking for?
When choosing an execution plan, oracle estimates costs for the different plans. One crucial information for that estimate is the amount of rows will get returned from a step of the execution plan. Oracle tries to estimate those using 'statistics', i.e. information about how many rows a table contains, how many different values a column contains; How evenly these values are distributed.
These statistics are just that statistics, and they might be wrong, which is one of the most important reasons for misjudgments of the oracle optimizer.
So gathering new statistics as described in a comment might help. Have a look at the documentation on that dbms_stats package. There are many different ways to call that package.
A common problem I've come across is a query that joins many tables, where the joins form a chain from one end to another, e.g.:
SELECT *
FROM tableA, tableB, tableC, tableD, tableE
WHERE tableA.ID0 = :bind1
AND tableA.ID1 = tableB.ID1
AND tableB.ID2 = tableC.ID2
AND tableC.ID3 = tableD.ID3
AND tableD.ID4 = tableE.ID4
AND tableE.ID5 = :bind2;
Notice how the optimiser might choose to drive the query from tableA (e.g. if the index on ID0 is nicely selective) or from tableE (if the index on tableE.ID5 is more selective).
The statistics on the tables might cause the choice between these two plans to balance on a knife-edge; one day it's working fine (driving from tableA), next day new stats are gathered and all of a sudden the alternative plan driving from tableE has a lower cost and is chosen.
In this circumstance, adding a LEADING hint is one way to nudge it back to the original plan (i.e. drive from tableA) without dictating too much to the optimiser (i.e. it doesn't force the optimiser to choose any particular join methods).
You're doing distributed query optimization, and that's a tricky beast. It could be that the your table's statistics are current, but now the tables at the remote system are out-of-whack or have changed. Or the remote system added/removed/modified indexes, and that broke your plan. (This is an excellent reason to consider replication -- so you can control indexes and statistics against it.)
That said, Oracle's estimate of cardinality is a primary driver in execution plan. A 10053 trace analysis (Jonathan Lewis' Cost-Based Oracle Fundamentals book has wonderful examples from 8i to 10.1) can help shed light on why your statement's now broken and how the LEADING hint fixes it.
The DRIVING_SITE hint might be a better choice if you know you always want the local tables to be joined first before going after the remote site; it clarifies your intention without driving the plan the way a LEADING hint would.
Might not be relevant but I had a similar situation once where the remote table had been replaced by a single-table view. When it was a table the distributed query optimizer 'saw' that it had an index. When it became a view it couldn't see the index anymore and couldn't cost a plan that used an index on the remote object.
That was a few years ago. I documented my analysis at the time here.
RI,
It's hard to be sure about the cause of the performance problems without seeing the SQL.
When an Oracle query was performing well before, and suddenly starts performing badly, it is usually related to one of two issues:
A) Statistics are out of date. This is the easiest and quickest thing to check, even if you have a housekeeping batch process that's supposed to take care of it ... always double-check.
B) Data volume / data pattern change.
In your case, running a distributed query across multiple databases makes it 10x harder for Oracle to manage performance between them. Is it possible to put these tables in one database, perhaps separate schema owners in one database?
Hints are notoriously fragile, as Oracle is under no obligations to follow the hint. When the data volume or pattern changes some more, Oracle may just ignore the hint and do what it thinks is best (ie. worst ;-).
If you cannot put these tables all in one database, then I recommend you look to break your query up into two statements:
INSERT on sub-SELECT to copy external data to a global temporary table in your current database.
SELECT from the global temporary table to join with your other table.
You will have complete control over performance of step 1 above without resorting to hints. This approach typically scales well, providing you take time to do the performance tuning. I've seen this approach solve many complex performance problems.
The overhead for Oracle to create a whole new table, or insert a heap of records, is much smaller than most people expect. Defining a global temporary table further reduces that overhead.
Matthew

Getting rid of Table Spool in SQL Server Execution plan

I have a query that creates several temporary tables and then inserts data into them. From what I understand this is a potential cause of Table Spool. When I look at my execution plan the bulk of my processing is spent on Table Spool. Are there any good techniques for improving these types of performance problems? Would using a view or a CTE offer me any benefits over the temp tables?
I also noticed that when I mouse over each table spool the output list is from the same temporary table.
Well, with the information you gave I can tell only that: the query optimizer has chosen the best possible plan. It uses table spools to speed up execution. Alternatives that don't use table spools would be even slower.
How about showing the query, the table(s) schema and cardinality, and the plan.
Update
I certainly understand if you cannot show us the query. But is really hard to guess why the spooling is preffered by the engine whithout knowing any specifics. I recommend you go over Craig Freedman's blog, he is an engineer in the query optimizer team and has explained a lot of the inner workings of SQL 2005/2008 optimizer. Here are some entries I could quickly find that touch the topic of spooling in one form or another:
Recursive CTEs
Ranking Functions: RANK, DENSE_RANK, and NTILE
More on TOP
SQL customer support team also has an interesting blog at http://blogs.msdn.com/psssql/
And 'sqltips' (the relational engine's team blog) has some tips, like Spool operators in query plan...

What generic techniques can be applied to optimize SQL queries?

What techniques can be applied effectively to improve the performance of SQL queries? Are there any general rules that apply?
Use primary keys
Avoid select *
Be as specific as you can when building your conditional statements
De-normalisation can often be more efficient
Table variables and temporary tables (where available) will often be better than using a large source table
Partitioned views
Employ indices and constraints
Learn what's really going on under the hood - you should be able to understand the following concepts in detail:
Indexes (not just what they are but actually how they work).
Clustered indexes vs heap allocated tables.
Text and binary lookups and when they can be in-lined.
Fill factor.
How records are ghosted for update/delete.
When page splits happen and why.
Statistics, and how they effect various query speeds.
The query planner, and how it works for your specific database (for instance on some systems "select *" is slow, on modern MS-Sql DBs the planner can handle it).
The biggest thing you can do is to look for table scans in sql server query analyzer (make sure you turn on "show execution plan"). Otherwise there are a myriad of articles at MSDN and elsewhere that will give good advice.
As an aside, when I started learning to optimize queries I ran sql server query profiler against a trace, looked at the generated SQL, and tried to figure out why that was an improvement. Query profiler is far from optimal, but it's a decent start.
There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented. Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005
Use a with statment to handle query filtering.
Limit each subquery to the minimum number of rows possible.
then join the subqueries.
WITH
master AS
(
SELECT SSN, FIRST_NAME, LAST_NAME
FROM MASTER_SSN
WHERE STATE = 'PA' AND
GENDER = 'M'
),
taxReturns AS
(
SELECT SSN, RETURN_ID, GROSS_PAY
FROM MASTER_RETURNS
WHERE YEAR < 2003 AND
YEAR > 2000
)
SELECT *
FROM master,
taxReturns
WHERE master.ssn = taxReturns.ssn
A subqueries within a with statement may end up as being the same as inline views,
or automatically generated temp tables. I find in the work I do, retail data, that about 70-80% of the time, there is a performance benefit.
100% of the time, there is a maintenance benefit.
I think using SQL query analyzer would be a good start.
In Oracle you can look at the explain plan to compare variations on your query
Make sure that you have the right indexes on the table. if you frequently use a column as a way to order or limit your dataset an index can make a big difference. I saw in a recent article that select distinct can really slow down a query, especially if you have no index.
The obvious optimization for SELECT queries is ensuring you have indexes on columns used for joins or in WHERE clauses.
Since adding indexes can slow down data writes you do need to monitor performance to ensure you don't kill the DB's write performance, but that's where using a good query analysis tool can help you balanace things accordingly.
Indexes
Statistics
on microsoft stack, Database Engine Tuning Advisor
Some other points (Mine are based on SQL server, since each db backend has it's own implementations they may or may not hold true for all databases):
Avoid correlated subqueries in the select part of a statement, they are essentially cursors.
Design your tables to use the correct datatypes to avoid having to apply functions on them to get the data out. It is far harder to do date math when you store your data as varchar for instance.
If you find that you are frequently doing joins that have functions in them, then you need to think about redesigning your tables.
If your WHERE or JOIN conditions include OR statements (which are slower) you may get better speed using a UNION statement.
UNION ALL is faster than UNION if (And only if) the two statments are mutually exclusive and return the same results either way.
NOT EXISTS is usually faster than NOT IN or using a left join with a WHERE clause of ID = null
In an UPDATE query add a WHERE condition to make sure you are not updating values that are already equal. The difference between updating 10,000,000 records and 4 can be quite significant!
Consider pre-calculating some values if you will be querying them frequently or for large reports. A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report. Pre-calculations should be done in triggers so that they are always up-to-date is the underlying data changes. And it doesn't have to be just numbers either, we havea calculated field that concatenates names that we use in reports.
Be wary of scalar UDFs, they can be slower than putting the code in line.
Temp table tend to be faster for large data set and table variables faster for small ones. In addition you can index temp tables.
Formatting is usually faster in the user interface than in SQL.
Do not return more data than you actually need.
This one seems obvious but you would not believe how often I end up fixing this. Do not join to tables that you are not using to filter the records or actually calling one of the fields in the select part of the statement. Unnecessary joins can be very expensive.
It is an very bad idea to create views that call other views that call other views. You may find you are joining to the same table 6 times when you only need to once and creating 100,000,00 records in an underlying view in order to get the 6 that are in your final result.
In designing a database, think about reporting not just the user interface to enter data. Data is useless if it is not used, so think about how it will be used after it is in the database and how that data will be maintained or audited. That will often change the design. (This is one reason why it is a poor idea to let an ORM design your tables, it is only thinking about one use case for the data.) The most complex queries affecting the most data are in reporting, so designing changes to help reporting can speed up queries (and simplify them) considerably.
Database-specific implementations of features can be faster than using standard SQL (That's one of the ways they sell their product), so get to know your database features and find out which are faster.
And because it can't be said too often, use indexes correctly, not too many or too few. And make your WHERE clauses sargable (Able to use indexes).

Favourite performance tuning tricks [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
When you have a query or stored procedure that needs performance tuning, what are some of the first things you try?
Here is the handy-dandy list of things I always give to someone asking me about optimisation.
We mainly use Sybase, but most of the advice will apply across the board.
SQL Server, for example, comes with a host of performance monitoring / tuning bits, but if you don't have anything like that (and maybe even if you do) then I would consider the following...
99% of problems I have seen are caused by putting too many tables in a join. The fix for this is to do half the join (with some of the tables) and cache the results in a temporary table. Then do the rest of the query joining on that temporary table.
Query Optimisation Checklist
Run UPDATE STATISTICS on the underlying tables
Many systems run this as a scheduled weekly job
Delete records from underlying tables (possibly archive the deleted records)
Consider doing this automatically once a day or once a week.
Rebuild Indexes
Rebuild Tables (bcp data out/in)
Dump / Reload the database (drastic, but might fix corruption)
Build new, more appropriate index
Run DBCC to see if there is possible corruption in the database
Locks / Deadlocks
Ensure no other processes running in database
Especially DBCC
Are you using row or page level locking?
Lock the tables exclusively before starting the query
Check that all processes are accessing tables in the same order
Are indices being used appropriately?
Joins will only use index if both expressions are exactly the same data type
Index will only be used if the first field(s) on the index are matched in the query
Are clustered indices used where appropriate?
range data
WHERE field between value1 and value2
Small Joins are Nice Joins
By default the optimiser will only consider the tables 4 at a time.
This means that in joins with more than 4 tables, it has a good chance of choosing a non-optimal query plan
Break up the Join
Can you break up the join?
Pre-select foreign keys into a temporary table
Do half the join and put results in a temporary table
Are you using the right kind of temporary table?
#temp tables may perform much better than #table variables with large volumes (thousands of rows).
Maintain Summary Tables
Build with triggers on the underlying tables
Build daily / hourly / etc.
Build ad-hoc
Build incrementally or teardown / rebuild
See what the query plan is with SET SHOWPLAN ON
See what’s actually happenning with SET STATS IO ON
Force an index using the pragma: (index: myindex)
Force the table order using SET FORCEPLAN ON
Parameter Sniffing:
Break Stored Procedure into 2
call proc2 from proc1
allows optimiser to choose index in proc2 if #parameter has been changed by proc1
Can you improve your hardware?
What time are you running? Is there a quieter time?
Is Replication Server (or other non-stop process) running? Can you suspend it? Run it eg. hourly?
Have a pretty good idea of the optimal path of running the query in your head.
Check the query plan - always.
Turn on STATS, so that you can examine both IO and CPU performance. Focus on driving those numbers down, not necessarily the query time (as that can be influenced by other activity, cache, etc.).
Look for large numbers of rows coming into an operator, but small numbers coming out. Usually, an index would help by limiting the number of rows coming in (which saves disk reads).
Focus on the largest cost subtree first. Changing that subtree can often change the entire query plan.
Common problems I've seen are:
If there's a lot of joins, sometimes Sql Server will choose to expand the joins, and then apply WHERE clauses. You can usually fix this by moving the WHERE conditions into the JOIN clause, or a derived table with the conditions inlined. Views can cause the same problems.
Suboptimal joins (LOOP vs HASH vs MERGE). My rule of thumb is to use a LOOP join when the top row has very few rows compared to the bottom, a MERGE when the sets are roughly equal and ordered, and a HASH for everything else. Adding a join hint will let you test your theory.
Parameter sniffing. If you ran the stored proc with unrealistic values at first (say, for testing), then the cached query plan may be suboptimal for your production values. Running again WITH RECOMPILE should verify this. For some stored procs, especially those that deal with varying sized ranges (say, all dates between today and yesterday - which would entail an INDEX SEEK - or, all dates between last year and this year - which would be better off with an INDEX SCAN) you may have to run it WITH RECOMPILE every time.
Bad indentation...Okay, so Sql Server doesn't have an issue with this - but I sure find it impossible to understand a query until I've fixed up the formatting.
Slightly off topic but if you have control over these issues...
High level and High Impact.
For high IO environments make sure your disks are for either RAID 10 or RAID 0+1 or some nested implementation of raid 1 and raid 0.
Don't use drives less than 1500K.
Make sure your disks are only used for your Database. IE no logging no OS.
Turn off auto grow or similar feature. Let the database use all storage that is anticipated. Not necessarily what is currently being used.
design your schema and indexes for the type queries.
if it's a log type table (insert only) and must be in the DB don't index it.
if your doing allot of reporting (complex selects with many joins) then you should look at creating a data warehouse with a star or snowflake schema.
Don't be afraid of replicating data in exchange for performance!
CREATE INDEX
Assure there are indexes available for your WHERE and JOIN clauses. This will speed data access greatly.
If your environment is a data mart or warehouse, indexes should abound for almost any conceivable query.
In a transactional environment, the number of indexes should be lower and their definitions more strategic so that index maintenance doesn't drag down resources. (Index maintenance is when the leaves of an index must be changed to reflect a change in the underlying table, as with INSERT, UPDATE, and DELETE operations.)
Also, be mindful of the order of fields in the index - the more selective (higher cardinality) a field, the earlier in the index it should appear. For example, say you're querying for used automobiles:
SELECT i.make, i.model, i.price
FROM dbo.inventory i
WHERE i.color = 'red'
AND i.price BETWEEN 15000 AND 18000
Price generally has higher cardinality. There may be only a few dozen colors available, but quite possibly thousands of different asking prices.
Of these index choices, idx01 provides the faster path to satisfy the query:
CREATE INDEX idx01 ON dbo.inventory (price, color)
CREATE INDEX idx02 ON dbo.inventory (color, price)
This is because fewer cars will satisfy the price point than the color choice, giving the query engine far less data to analyze.
I've been known to have two very similar indexes differing only in the field order to speed queries (firstname, lastname) in one and (lastname, firstname) in the other.
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible and try to eliminate file sorts. High Performance MySQL: Optimization, Backups, Replication, and More is a great book on this topic as is MySQL Performance Blog.
A trick I recently learned is that SQL Server can update local variables as well as fields, in an update statement.
UPDATE table
SET #variable = column = #variable + otherColumn
Or the more readable version:
UPDATE table
SET
#variable = #variable + otherColumn,
column = #variable
I've used this to replace complicated cursors/joins when implementing recursive calculations, and also gained a lot in performance.
Here's details and example code that made fantastic improvements in performance:
Link
#Terrapin there are a few other differences between isnull and coalesce that are worth mentioning (besides ANSI compliance, which is a big one for me).
Coalesce vs. IsNull
Sometimes in SQL Server if you use an OR in a where clause it will really jack with performance. Instead of using the OR just do two selects and union them together. You get the same results at 1000x the speed.
Look at the where clause - verify use of indexes / verify nothing silly is being done
where SomeComplicatedFunctionOf(table.Column) = #param --silly
I'll generally start with the joins - I'll knock each one of them out of the query one at a time and re-run the query to get an idea if there's a particular join I'm having a problem with.
On all of my temp tables, I like to add unique constraints (where appropriate) to make indexes, and primary keys (almost always).
declare #temp table(
RowID int not null identity(1,1) primary key,
SomeUniqueColumn varchar(25) not null,
SomeNotUniqueColumn varchar(50) null,
unique(SomeUniqueColumn)
)
#DavidM
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible...
In SQL Server, execution plan gets you the same thing - it tells you what indexes are being hit, etc.
Not necessarily a SQL performance trick per se but definately related:
A good idea would be to use memcached where possible as it would be much faster just fetching the precompiled data directly from memory rather than getting it from the database. There's also a flavour of MySQL that got memcached built in (third party).
Make sure your index lengths are as small as possible. This allows the DB to read more keys at a time from the file system, thus speeding up your joins. I assume this works with all DB's, but I know it's a specific recommendation for MySQL.
I've made it a habit to always use bind variables. It's possible bind variables won't help if the RDBMS doesn't cache SQL statements. But if you don't use bind variables the RDBMS doesn't have a chance to reuse query execution plans and parsed SQL statements. The savings can be enormous: http://www.akadia.com/services/ora_bind_variables.html. I work mostly with Oracle, but Microsoft SQL Server works pretty much the same way.
In my experience, if you don't know whether or not you are using bind variables, you probably aren't. If your application language doesn't support them, find one that does. Sometimes you can fix query A by using bind variables for query B.
After that, I talk to our DBA to find out what's causing the RDBMS the most pain. Note that you shouldn't ask "Why is this query slow?" That's like asking your doctor to take out you appendix. Sure your query might be the problem, but it's just as likely that something else is going wrong. As developers, we we tend to think in terms of lines of code. If a line is slow, fix that line. But a RDBMS is a really complicated system and your slow query might be the symptom of a much larger problem.
Way too many SQL tuning tips are cargo cult idols. Most of the time the problem is unrelated or minimally related to the syntax you use, so it's normally best to use the cleanest syntax you can. Then you can start looking at ways to tune the database (not the query). Only tweak the syntax when that fails.
Like any performance tuning, always collect meaningful statistics. Don't use wallclock time unless it's the user experience you are tuning. Instead look at things like CPU time, rows fetched and blocks read off of disk. Too often people optimize for the wrong thing.
First step:
Look at the Query Execution Plan!
TableScan -> bad
NestedLoop -> meh warning
TableScan behind a NestedLoop -> DOOM!
SET STATISTICS IO ON
SET STATISTICS TIME ON
Running the query using WITH (NoLock) is pretty much standard operation in my place. Anyone caught running queries on the tens-of-gigabytes tables without it is taken out and shot.
Convert NOT IN queries to LEFT OUTER JOINS if possible. For example if you want to find all rows in Table1 that are unused by a foreign key in Table2 you could do this:
SELECT *
FROM Table1
WHERE Table1.ID NOT IN (
SELECT Table1ID
FROM Table2)
But you get much better performance with this:
SELECT Table1.*
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.ID = Table2.Table1ID
WHERE Table2.ID is null
Index the table(s) by the clm(s) you filter by
Prefix all tables with dbo. to prevent recompilations.
View query plans and hunt for table/index scans.
In 2005, scour the management views for missing indexes.
I like to use
isnull(SomeColThatMayBeNull, '')
Over
coalesce(SomeColThatMayBeNull, '')
When I don't need the multiple argument support that coalesce gives you.
http://blog.falafel.com/2006/04/05/SQLServerArcanaISNULLVsCOALESCE.aspx
I look out for:
Unroll any CURSOR loops and convert into set based UPDATE / INSERT statements.
Look out for any application code that:
Calls an SP that returns a large set of records,
Then in the application, goes through each record and calls an SP with parameters to update records.
Convert this into a SP that does all the work in one transaction.
Any SP that does lots of string manipulation. It's evidence that the data is not structured correctly / normalised.
Any SP's that re-invent the wheel.
Any SP's that I can't understand what it's trying to do within a minute!
SET NOCOUNT ON
Usually the first line inside my stored procedures, unless I actually need to use ##ROWCOUNT.
In SQL Server, use the nolock directive. It allows the select command to complete without having to wait - usually other transactions to finish.
SELECT * FROM Orders (nolock) where UserName = 'momma'
Remove cursors wherever the are not neceesary.
Remove function calls in Sprocs where a lot of rows will call the function.
My colleague used function calls (getting lastlogindate from userid as example) to return very wide recordsets.
Tasked with optimisation, I replaced the function calls in the sproc with the function's code: I got many sprocs' running time down from > 20 seconds to < 1.
Don't prefix Stored Procedure names with "sp_" because system procedures all start with "sp_", and SQL Server will have to search harder to find your procedure when it gets called.
Dirty reads -
set transaction isolation level read uncommitted
Prevents dead locks where transactional integrity isn't absolutely necessary (which is usually true)
I always go to SQL Profiler (if it's a stored procedure with a lot of nesting levels) or the query execution planner (if it's a few SQL statements with no nesting) first. 90% of the time you can find the problem immediately with one of these two tools.