Optimization of large SQL Select query in odbc environment, comparison operators - sql

I'm currently designing a database application that executes SQL statements on a SQL Server linked to the PCs via ODBC drivers (SQL Native Client v10, Local Network, Network Latency >1ms, executed from withing a MS Access 2003 Environment).
I'm dealing with a peculiar select query that is executed often and has to iterate through an indexed table with about 1.5 million entries. Currently the query structure is this:
SELECT *
FROM table1
WHERE field1 = value1
AND field2 = value2
AND textfield1 LIKE '* value3 *'
AND (field3 = value3 OR field4 = value4 OR field5 = value5)
ORDER BY indexedField1 DESC
(Simplified for reading comprehension and understandability, the real query can have up to 4 bracketed AND connected OR blocks, and up to a total of 31 AND connected statements).
Currently this query takes about ~2s every time it gets executed. It returns somewhere between 1.000 and 15.000 records in usual production. I'm looking for a way to make it execute faster or to restructure it in a way to make it work faster.
Coworkers of mine have hinted at the fact that using LIKE operators might be performance inefficient and that restructuring the OR statements in brackets could bring additional performance.
Edit: Additional relevant information: the table that is being pulled from is VERY active, there is an entry roughly every 1-5 minutes into it.
So the final question is:
Given my parameters outlined above, is this version of the query the most simplistic I can get it.
Can I do something to otherwise speed up the query or the execution time thereof.

General query optimisation is beyond the scope of a single answer, though there may be some help to be had on http://dba.stackexchange.com. However, you should learn how to read a query plan and figure our your bottlenecks before you start optimising.
(The way I'd do that would be to take a few typical queries and look at their estimated execution plan through a tool like SQL Server Management Studio. You may have to try to dig out the real SQL Server query that's resulted from your Access query, which your DBA might be able to help with. I'm assuming that your Access query is actually being translated into a SQL Server query and run natively on the server; if it's not then that will be your big problem!)
I'm going to assume you've indexed every column used in every predicate in your WHERE clause and still have a problem. If that's the case, the suspect is likely to be:
AND textfield1 LIKE '* value3 *'
Because that can't use an index. (It's not SARGable, because it has a wildcard at the beginning, so an index won't be any help.)
If you can't rearrange your search or pre-calculate this particular predicate, then you basically have the problem that Full-Text Searching was designed to solve by tokenising and pre-indexing the words in your text, and that will probably be the best solution.

Related

Why would my parameterized query not use indexes correctly?

This is in SQL Server 2014, but I'm seeing the same behavior in 2008, 2012, and 2016, as well as Sybase ASE 15.7.
I have a simple query that looks like this:
SELECT myField
FROM myTable
WHERE someIndexedField = #myParam
If I run this query from SSMS (replacing #myParam with 'myValue'), the query runs in under a second, because someIndexedField is indexed, and lookups should be very fast.
However, if I create the same query as a parameterized query string in a C# program, the query takes 20 to 30 seconds. Analysis by the DBAs shows that the query plan is NOT using the index on the someIndexedField column, but it is instead doing a table scan.
Even stranger, if I do the exact same parameterized query, but instead change it slightly to this:
DECLARE #_myParam char(13)
SET #_myParam = #myParam
SELECT myField
FROM myTable
WHERE someIndexedField = #_myParam
...this version suddenly uses the index again, and performance is back up to sub-second response times. I see this same behavior in queries of various complexity, but not 100% consistently for different queries - sometimes the server DOES decide to use an index. It IS, however, consistent for any given query. I never know which queries will be affected before trying them, but if a given query has this problem, it ALWAYS has the problem. Also, a query that does not have the problem doesn't ever seem to develop it later.
Another odd thing I have noticed is that sometimes, changing the total length of the query will actually make a difference in how this behavior shows up. I had one example where adding extra carriage returns into the query (essentially double-spacing it) actually caused the server to suddenly start using indexes as expected. Literally NO CODE was changed. I was unable to pin down an exact length at which this happened, however. Also, this particular "solution" only seemed to work on Sybase ASE - I was unable to reproduce that one on SQL Server.
(Incidentally, I can also use a hint to push the server to use the appropriate index, and this also fixes the problem. However, index hints are generally not a good idea if you can avoid them, and it seems that the server should be perfectly capable of picking an index on its own, especially with such a simple query.)
What's going on here? Why is the first version running as though there were no indexes on the table? And why does simply putting the parameter into a locally defined variable suddenly cause indexes to be used?

Slow SQL query involving CONTAINS and OR

We’re having a problem we were hoping the good folks of Stack Overflow could help us with. We’re running SQL Server 2008 R2 and are having problems with a query that takes a very long time to run on a moderate set of data , about 100000 rows. We're using CONTAINS to search through xml files and LIKE on another column to support leading wild cards.
We’ve reproduced the problem with the following small query that takes about 35 seconds to run:
SELECT something FROM table1
WHERE (CONTAINS(TextColumn, '"WhatEver"') OR
DescriptionColumn LIKE '%WhatEver%')
Query plan:
If we modify the query above to using UNION instead, the running time drops from 35 seconds to < 1 seconds. We would like to avoid using this approach to solve the issue.
SELECT something FROM table1 WHERE (CONTAINS(TextColumn, '"WhatEver"')
UNION
(SELECT something FROM table1 WHERE (DescriptionColumn LIKE '%WhatEver%'))
Query plan:
The column that we’re using CONTAINS to search through is a column with type image and consists of xml files sized anywhere from 1k to 20k in size.
We have no good theories as to why the first query is so slow so we were hoping someone here would have something wise to say on the matter. The query plans don’t show anything out of the ordinary as far as we can tell. We've also rebuilt the indexes and statistics.
Is there anything blatantly obvious we’re overlooking here?
Thanks in advance for your time!
Why are you using DescriptionColumn LIKE '%WhatEver%' instead of CONTAINS(DescriptionColumn, '"WhatEver"')?
CONTAINS is obviously a Full-Text predicate and will use the SQL Server Full-Text engine to filter the search results, however LIKE is a "normal" SQL Server keyword and so SQL Server will not use the Full-Text engine to asist with this query - In this case because the LIKE term begins with a wildcard SQL Server will be unable to use any indexes to help with the query either which will most likely result in a table scan and / or poorer performance than using the Full-Text engine.
Its difficult impossible to tell without an execution plan, however my guess on whats happening would be:
The UNION variation of the query is performing a table scan against table1 - the table scan is not fast, however because there are relatively few rows in the table it is not performing that slowly (compared to a 35s benchmark).
In the OR variation of the query SQL Server is first using the Full-Text engine to filter based on the CONTAINS and then goes on to perform an RDI lookup on each matching row in the result to filter based on the LIKE predicate, however for some reason SQL Server has massively underestimated the number of rows (this can happen with certain types of predicate) and so goes on to perform several thousnad RDI lookups which ends up being incredibly slow (a table scan would have been much quicker).
To really understand whats going on you need to get a query plan.
Did you guys try this:
SELECT *
FROM table
WHERE CONTAINS((column1, column2, column3), '"*keyword*"')
Instead of this:
SELECT *
FROM table
WHERE CONTAINS(column1, '"*keyword*"')
OR CONTAINS(column2, '"*keyword*"')
OR CONTAINS(column3y, '"*keyword*"')
The first one is a lot faster.
I just ran into this. This is reportedly a bug on SQL server 2008 R2:
http://www.arcomit.co.uk/support/kb.aspx?kbid=000060
Your approach of using a UNION of two selects instead of an OR is the workaround they recommend in that article.

SQL Server Express performance issue

I know my questions will sound silly and probably nobody will have perfect answer but since I am in a complete dead-end with the situation it will make me feel better to post it here.
So...
I have a SQL Server Express database that's 500 Mb. It contains 5 tables and maybe 30 stored procedure. This database is use to store articles and is use for the Developer It web site. Normally the web pages load quickly, let's say 2 ou 3 sec. BUT, sqlserver process uses 100% of the processor for those 2 or 3 sec.
I try to find which stored procedure was the problem and I could not find one. It seems like every read into the table dans contains the articles (there are about 155,000 of them and 20 or so gets added every 15 minutes).
I added few indexes but without luck...
It is because the table is full text indexed ?
Should I have order with the primary key instead of date ? I never had any problems with ordering by dates....
Should I use dynamic SQL ?
Should I add the primary key into the URL of the articles ?
Should I use multiple indexes for separate columns or one big index ?
I you want more details or code bits, just ask for it.
Basically, every little hint is much appreciated.
Thanks.
If your index is not being used, then it usually indicates one of two problems:
Non-sargable predicate conditions, such as WHERE DATEPART(YY, Column) = <something>. Wrapping columns in a function will impair or eliminate the optimizer's ability to effectively use an index.
Non-covered columns in the output list, which is very likely if you're in the habit of writing SELECT * instead of SELECT specific_columns. If the index doesn't cover your query, then SQL Server needs to perform a RID/key lookup for every row, one by one, which can slow down the query so much that the optimizer just decides to do a table scan instead.
See if one of these might apply to your situation; if you're still confused, I'd recommend updating the question with more information about your schema, the data, and the queries that are slow. 500 MB is very small for a SQL database, so this shouldn't be slow. Also post what's in the execution plan.
Use SQL Profiler to capture a lot of typical queries used in your app. Then run the profiler results through index tuning wizard. That will tell you what indexes can be added to optimize.
Then look at the worst performing queries and analyze their execution plans manually.

Is a dynamic sql stored procedure a bad thing for lots of records?

I have a table with almost 800,000 records and I am currently using dynamic sql to generate the query on the back end. The front end is a search page which takes about 20 parameters and depending on if a parameter was chosen, it adds an " AND ..." to the base query. I'm curious as to if dynamic sql is the right way to go ( doesn't seem like it because it runs slow). I am contemplating on just creating a denormalized table with all my data. Is this a good idea or should I just build the query all together instead of building it piece by piece using the dynamic sql. Last thing, is there a way to speed up dynamic sql?
It is more likely that your indexing (or lack thereof) is causing the slowness than the dynamic SQL.
What does the execution plan look like? Is the same query slow when executed in SSMS? What about when it's in a stored procedure?
If your table is an unindexed heap, it will perform poorly as the number of records grows - this is regardless of the query, and a dynamic query can actually perform better as the table nature changes because a dynamic query is more likely to have its query plan re-evaluated when it's not in the cache. This is not normally an issue (and I would not classify it as a design advantage of dynamic queries) except in the early stages of a system when SPs have not been recompiled, but statistics and query plans are out of date, but the volume of data has just drastically changed.
Not the static one yet. I have with the dynamic query, but it does not give any optimizations. If I ran it with the static query and it gave suggestions, would applying them affect the dynamic query? – Xaisoft (41 mins ago)
Yes, the dynamic query (EXEC (#sql)) is probably not going to be analyzed unless you analyzed a workload file. – Cade Roux (33 mins ago)
When you have a search query across multiple tables that are joined, the columns with indexes need to be the search columns as well as the primary key/foreign key columns - but it depends on the cardinality of the various tables. The tuning analyzer should show this. – Cade Roux (22 mins ago)
I'd just like to point out that if you use this style of optional parameters:
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)
The query optimizer will have no idea whether the parameter is there or not when it produces the query plan. I have seen cases where the optimizer makes bad choices in these cases. A better solution is to build the sql that uses only the parameters you need. The optimizer will make the most efficient execution plan in these cases. Be sure to use parameterized queries so that they are reusable in the plan cache.
As previous answer, check your indexes and plan.
The question is whether you are using a stored procedure. It's not obvious from the way you worded it. A stored procedure creates a query plan when run, and keeps that plan until recompiled. With varying SQL, you may be stuck with a bad query plan. You could do several things:
1) Add WITH RECOMPILE to the SP definition, which will cause a new plan to be generated with every execution. This includes some overhead, which may be acceptable.
2) Use separate SP's, depending on the parameters provided. This will allow better query plan caching
3) Use client generated SQL. This will create a query plan each time. If you use parameterized queries, this may allow you to use cached query plans.
The only difference between "dynamic" and "static" SQL is the parsing/optimization phase. Once those are done, the query will run identically.
For simple queries, this parsing phase plus the network traffic turns out to be a significant percentage of the total transaction time, so it's good practice to try and reduce these times.
But for large, complicated queries, this processing is overall insignificant compared to the actual path chosen by the optimizer.
I would focus on optimizing the query itself, including perhaps denormalization if you feel that it's appropriate, though I wouldn't do that on a first go around myself.
Sometimes the denormalization can be done at "run time" in the application using cached lookup tables, for example, rather than maintaining this o the database.
Not a fan of dynamic Sql but if you are stuck with it, you should probably read this article:
http://www.sommarskog.se/dynamic_sql.html
He really goes in depth on the best ways to use dynamic SQL and the isues using it can create.
As others have said, indexing is the most likely culprit. In indexing, one thing people often forget to do is put an index on the FK fields. Since a PK creates an index automatically, many assume an FK will as well. Unfortunately creating an FK does nto create an index. So make sure that any fields you join on are indexed.
There may be better ways to create your dynamic SQL but without seeing the code it is hard to say. I would at least look to see if it is using subqueries and replace them with derived table joins instead. Also any dynamic SQl that uses a cursor is bound to be slow.
If the parameters are optional, a trick that's often used is to create a procedure like this:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null )
AS
SELECT * --not in production code!
FROM Articles
WHERE AuthorId = #AuthorId
AND (#EarliestDate is Null OR PublishedDate < #EarliestDate)
There are some good examples of queries with optional search criteria here: How do I create a stored procedure that will optionally search columns?
As noted, if you are doing a massive query, Indexes are the first bottleneck to look at. Make sure that heavily queried columns are indexed. Also, make sure that your query checks all indexed parameters before it checks un-indexed parameters. This makes sure that the results are filtered down using indexes first and then does the slow linear search only if it has to. So if col2 is indexed but col1 is not, it should look as follows:
WHERE col2 = #col2 AND col1 = #col1
You may be tempted to go overboard with indexes as well, but keep in mind that too many indexes can cause slow writes and massive disk usage, so don't go too too crazy.
I avoid dynamic queries if I can for two reasons. One, they do not save the query plan, so the statement gets compiled each time. The other is that they are hard to manipulate, test, and troubleshoot. (They just look ugly).
I like Dave Kemp's answer above.
I've had some success (in a limited number of instances) with the following logic:
CREATE PROCEDURE GetArticlesByAuthor (
#AuthorId int,
#EarliestDate datetime = Null
) AS
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND #EarliestDate is Null
UNION
SELECT SomeColumn
FROM Articles
WHERE AuthorId = #AuthorId
AND PublishedDate < #EarliestDate
If you are trying to optimize to below the 1s range, it may be important to gauge approximately how long it takes to parse and compile the dynamic sql relative to the actual query execution time:
SET STATISTICS TIME ON;
and then execute the dynamic SQL string "statically" and check the "Messages" tab. I was surprised by these results for a ~10 line dynamic sql query that returns two rows from a 1M row table:
SQL Server parse and compile time:
CPU time = 199 ms, elapsed time = 199 ms.
(2 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 4 ms.
Index optimization will doubtfully move the 199ms barrier much (except perhaps due to some analyzation/optimization included within the compile time).
However if the dynamic SQL uses parameters or is repeating than the compile results may be cached according to: See Caching Query Plans which would eliminate the compile time. Would be interesting to know how long cache entries live, size, shared between sessions, etc.

Favourite performance tuning tricks [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
When you have a query or stored procedure that needs performance tuning, what are some of the first things you try?
Here is the handy-dandy list of things I always give to someone asking me about optimisation.
We mainly use Sybase, but most of the advice will apply across the board.
SQL Server, for example, comes with a host of performance monitoring / tuning bits, but if you don't have anything like that (and maybe even if you do) then I would consider the following...
99% of problems I have seen are caused by putting too many tables in a join. The fix for this is to do half the join (with some of the tables) and cache the results in a temporary table. Then do the rest of the query joining on that temporary table.
Query Optimisation Checklist
Run UPDATE STATISTICS on the underlying tables
Many systems run this as a scheduled weekly job
Delete records from underlying tables (possibly archive the deleted records)
Consider doing this automatically once a day or once a week.
Rebuild Indexes
Rebuild Tables (bcp data out/in)
Dump / Reload the database (drastic, but might fix corruption)
Build new, more appropriate index
Run DBCC to see if there is possible corruption in the database
Locks / Deadlocks
Ensure no other processes running in database
Especially DBCC
Are you using row or page level locking?
Lock the tables exclusively before starting the query
Check that all processes are accessing tables in the same order
Are indices being used appropriately?
Joins will only use index if both expressions are exactly the same data type
Index will only be used if the first field(s) on the index are matched in the query
Are clustered indices used where appropriate?
range data
WHERE field between value1 and value2
Small Joins are Nice Joins
By default the optimiser will only consider the tables 4 at a time.
This means that in joins with more than 4 tables, it has a good chance of choosing a non-optimal query plan
Break up the Join
Can you break up the join?
Pre-select foreign keys into a temporary table
Do half the join and put results in a temporary table
Are you using the right kind of temporary table?
#temp tables may perform much better than #table variables with large volumes (thousands of rows).
Maintain Summary Tables
Build with triggers on the underlying tables
Build daily / hourly / etc.
Build ad-hoc
Build incrementally or teardown / rebuild
See what the query plan is with SET SHOWPLAN ON
See what’s actually happenning with SET STATS IO ON
Force an index using the pragma: (index: myindex)
Force the table order using SET FORCEPLAN ON
Parameter Sniffing:
Break Stored Procedure into 2
call proc2 from proc1
allows optimiser to choose index in proc2 if #parameter has been changed by proc1
Can you improve your hardware?
What time are you running? Is there a quieter time?
Is Replication Server (or other non-stop process) running? Can you suspend it? Run it eg. hourly?
Have a pretty good idea of the optimal path of running the query in your head.
Check the query plan - always.
Turn on STATS, so that you can examine both IO and CPU performance. Focus on driving those numbers down, not necessarily the query time (as that can be influenced by other activity, cache, etc.).
Look for large numbers of rows coming into an operator, but small numbers coming out. Usually, an index would help by limiting the number of rows coming in (which saves disk reads).
Focus on the largest cost subtree first. Changing that subtree can often change the entire query plan.
Common problems I've seen are:
If there's a lot of joins, sometimes Sql Server will choose to expand the joins, and then apply WHERE clauses. You can usually fix this by moving the WHERE conditions into the JOIN clause, or a derived table with the conditions inlined. Views can cause the same problems.
Suboptimal joins (LOOP vs HASH vs MERGE). My rule of thumb is to use a LOOP join when the top row has very few rows compared to the bottom, a MERGE when the sets are roughly equal and ordered, and a HASH for everything else. Adding a join hint will let you test your theory.
Parameter sniffing. If you ran the stored proc with unrealistic values at first (say, for testing), then the cached query plan may be suboptimal for your production values. Running again WITH RECOMPILE should verify this. For some stored procs, especially those that deal with varying sized ranges (say, all dates between today and yesterday - which would entail an INDEX SEEK - or, all dates between last year and this year - which would be better off with an INDEX SCAN) you may have to run it WITH RECOMPILE every time.
Bad indentation...Okay, so Sql Server doesn't have an issue with this - but I sure find it impossible to understand a query until I've fixed up the formatting.
Slightly off topic but if you have control over these issues...
High level and High Impact.
For high IO environments make sure your disks are for either RAID 10 or RAID 0+1 or some nested implementation of raid 1 and raid 0.
Don't use drives less than 1500K.
Make sure your disks are only used for your Database. IE no logging no OS.
Turn off auto grow or similar feature. Let the database use all storage that is anticipated. Not necessarily what is currently being used.
design your schema and indexes for the type queries.
if it's a log type table (insert only) and must be in the DB don't index it.
if your doing allot of reporting (complex selects with many joins) then you should look at creating a data warehouse with a star or snowflake schema.
Don't be afraid of replicating data in exchange for performance!
CREATE INDEX
Assure there are indexes available for your WHERE and JOIN clauses. This will speed data access greatly.
If your environment is a data mart or warehouse, indexes should abound for almost any conceivable query.
In a transactional environment, the number of indexes should be lower and their definitions more strategic so that index maintenance doesn't drag down resources. (Index maintenance is when the leaves of an index must be changed to reflect a change in the underlying table, as with INSERT, UPDATE, and DELETE operations.)
Also, be mindful of the order of fields in the index - the more selective (higher cardinality) a field, the earlier in the index it should appear. For example, say you're querying for used automobiles:
SELECT i.make, i.model, i.price
FROM dbo.inventory i
WHERE i.color = 'red'
AND i.price BETWEEN 15000 AND 18000
Price generally has higher cardinality. There may be only a few dozen colors available, but quite possibly thousands of different asking prices.
Of these index choices, idx01 provides the faster path to satisfy the query:
CREATE INDEX idx01 ON dbo.inventory (price, color)
CREATE INDEX idx02 ON dbo.inventory (color, price)
This is because fewer cars will satisfy the price point than the color choice, giving the query engine far less data to analyze.
I've been known to have two very similar indexes differing only in the field order to speed queries (firstname, lastname) in one and (lastname, firstname) in the other.
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible and try to eliminate file sorts. High Performance MySQL: Optimization, Backups, Replication, and More is a great book on this topic as is MySQL Performance Blog.
A trick I recently learned is that SQL Server can update local variables as well as fields, in an update statement.
UPDATE table
SET #variable = column = #variable + otherColumn
Or the more readable version:
UPDATE table
SET
#variable = #variable + otherColumn,
column = #variable
I've used this to replace complicated cursors/joins when implementing recursive calculations, and also gained a lot in performance.
Here's details and example code that made fantastic improvements in performance:
Link
#Terrapin there are a few other differences between isnull and coalesce that are worth mentioning (besides ANSI compliance, which is a big one for me).
Coalesce vs. IsNull
Sometimes in SQL Server if you use an OR in a where clause it will really jack with performance. Instead of using the OR just do two selects and union them together. You get the same results at 1000x the speed.
Look at the where clause - verify use of indexes / verify nothing silly is being done
where SomeComplicatedFunctionOf(table.Column) = #param --silly
I'll generally start with the joins - I'll knock each one of them out of the query one at a time and re-run the query to get an idea if there's a particular join I'm having a problem with.
On all of my temp tables, I like to add unique constraints (where appropriate) to make indexes, and primary keys (almost always).
declare #temp table(
RowID int not null identity(1,1) primary key,
SomeUniqueColumn varchar(25) not null,
SomeNotUniqueColumn varchar(50) null,
unique(SomeUniqueColumn)
)
#DavidM
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible...
In SQL Server, execution plan gets you the same thing - it tells you what indexes are being hit, etc.
Not necessarily a SQL performance trick per se but definately related:
A good idea would be to use memcached where possible as it would be much faster just fetching the precompiled data directly from memory rather than getting it from the database. There's also a flavour of MySQL that got memcached built in (third party).
Make sure your index lengths are as small as possible. This allows the DB to read more keys at a time from the file system, thus speeding up your joins. I assume this works with all DB's, but I know it's a specific recommendation for MySQL.
I've made it a habit to always use bind variables. It's possible bind variables won't help if the RDBMS doesn't cache SQL statements. But if you don't use bind variables the RDBMS doesn't have a chance to reuse query execution plans and parsed SQL statements. The savings can be enormous: http://www.akadia.com/services/ora_bind_variables.html. I work mostly with Oracle, but Microsoft SQL Server works pretty much the same way.
In my experience, if you don't know whether or not you are using bind variables, you probably aren't. If your application language doesn't support them, find one that does. Sometimes you can fix query A by using bind variables for query B.
After that, I talk to our DBA to find out what's causing the RDBMS the most pain. Note that you shouldn't ask "Why is this query slow?" That's like asking your doctor to take out you appendix. Sure your query might be the problem, but it's just as likely that something else is going wrong. As developers, we we tend to think in terms of lines of code. If a line is slow, fix that line. But a RDBMS is a really complicated system and your slow query might be the symptom of a much larger problem.
Way too many SQL tuning tips are cargo cult idols. Most of the time the problem is unrelated or minimally related to the syntax you use, so it's normally best to use the cleanest syntax you can. Then you can start looking at ways to tune the database (not the query). Only tweak the syntax when that fails.
Like any performance tuning, always collect meaningful statistics. Don't use wallclock time unless it's the user experience you are tuning. Instead look at things like CPU time, rows fetched and blocks read off of disk. Too often people optimize for the wrong thing.
First step:
Look at the Query Execution Plan!
TableScan -> bad
NestedLoop -> meh warning
TableScan behind a NestedLoop -> DOOM!
SET STATISTICS IO ON
SET STATISTICS TIME ON
Running the query using WITH (NoLock) is pretty much standard operation in my place. Anyone caught running queries on the tens-of-gigabytes tables without it is taken out and shot.
Convert NOT IN queries to LEFT OUTER JOINS if possible. For example if you want to find all rows in Table1 that are unused by a foreign key in Table2 you could do this:
SELECT *
FROM Table1
WHERE Table1.ID NOT IN (
SELECT Table1ID
FROM Table2)
But you get much better performance with this:
SELECT Table1.*
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.ID = Table2.Table1ID
WHERE Table2.ID is null
Index the table(s) by the clm(s) you filter by
Prefix all tables with dbo. to prevent recompilations.
View query plans and hunt for table/index scans.
In 2005, scour the management views for missing indexes.
I like to use
isnull(SomeColThatMayBeNull, '')
Over
coalesce(SomeColThatMayBeNull, '')
When I don't need the multiple argument support that coalesce gives you.
http://blog.falafel.com/2006/04/05/SQLServerArcanaISNULLVsCOALESCE.aspx
I look out for:
Unroll any CURSOR loops and convert into set based UPDATE / INSERT statements.
Look out for any application code that:
Calls an SP that returns a large set of records,
Then in the application, goes through each record and calls an SP with parameters to update records.
Convert this into a SP that does all the work in one transaction.
Any SP that does lots of string manipulation. It's evidence that the data is not structured correctly / normalised.
Any SP's that re-invent the wheel.
Any SP's that I can't understand what it's trying to do within a minute!
SET NOCOUNT ON
Usually the first line inside my stored procedures, unless I actually need to use ##ROWCOUNT.
In SQL Server, use the nolock directive. It allows the select command to complete without having to wait - usually other transactions to finish.
SELECT * FROM Orders (nolock) where UserName = 'momma'
Remove cursors wherever the are not neceesary.
Remove function calls in Sprocs where a lot of rows will call the function.
My colleague used function calls (getting lastlogindate from userid as example) to return very wide recordsets.
Tasked with optimisation, I replaced the function calls in the sproc with the function's code: I got many sprocs' running time down from > 20 seconds to < 1.
Don't prefix Stored Procedure names with "sp_" because system procedures all start with "sp_", and SQL Server will have to search harder to find your procedure when it gets called.
Dirty reads -
set transaction isolation level read uncommitted
Prevents dead locks where transactional integrity isn't absolutely necessary (which is usually true)
I always go to SQL Profiler (if it's a stored procedure with a lot of nesting levels) or the query execution planner (if it's a few SQL statements with no nesting) first. 90% of the time you can find the problem immediately with one of these two tools.