Better way to update TSQL - sql

We are in the process of some data integration and I get update scripts in the form of
UPDATE Table1 SET Table1.field1 = '12345' WHERE Table1.field2 = '345667';
UPDATE Table1 SET Table1.field1 = '12365' WHERE Table1.field2 = '567885';
Table1.field2 is not indexed.
The scripts run without problems, but it takes forever. 8000 rows affected in a bit over 7 minutes, which I feel is a bit long. (It's running on a dev server which is not the best, but a look at the server doesn't indicate that it is overly busy).
So my question is, is there a better way (i.e. faster) to run this type of update statements. (SQL 2008 R2)
Many Thanks!

You may be RBAR-ing (row-by-agonizing-row) the server with multiple UPDATE statements. Essentially, you're having it do a table scan for each query, which is obviously non-ideal. While an index would help the most, executing multiple single-value statements will still cost you.
SQL Server allows you to use JOINs for update statements, so you may see some improvement doing something like this:
WITH Incoming AS (SELECT field1, field2
FROM (VALUES('12345', '345667'),
('12365', '567885')) i(field1, field2))
UPDATE Table1
SET Table1.field1 = Incoming.field1
FROM Table1
JOIN Incoming
ON Incoming.field2 = Table1.field2;
SQL Fiddle Example
If it turns out that the number of rows in Incoming is large, you should probably realize it as an actual table that you bulk-load into first. You should be able to put an index on the load table (refreshed after the import, to make sure statistics are correct).
But really, an index on field2 should probably be the first thing, especially if there are multiple queries that use that column.

As I see it, you can try different things (like storing all values on another table, then updating one using the other), but in the end, the engine is going to search using a single field, testing equality with a value.
That would require an index. If you can at least test in dev, maybe you can show the performance improvement to someone who can authorize the creation of the new index in the production environment.
That's my answer, I hope someone comes up with a better one!

Related

How to eliminate multiple server calls | MS SQL Server

There is a stored procedure that needs to be modified to eliminate a call to another server.
What is the easiest and feasible way to do this so that the final SP's execution time is faster and also preference to solutions which do not involve much change to the application?
Eg:
select *
from dbo.table1 a
inner join server2.dbo.table2 b on a.id = b.id
Cross server JOINs can be problematic as the optimiser doesn't always pick the most effective solution, which may even result in the entire remote table being dragged over your network to be queried for a single row.
Replication is by far the best option, if you can justify it. This will mean you need to have a primary key on the table you want to replicate, which seems a reasonable constraint (ha!), but might become an issue with a third-party system.
if the remote table is small then it might be better to take a temporary local copy, e.g. SELECT * INTO #temp FROM server2.<database>.dbo.table2;. Then you can change your query to something like this: select * from dbo.table1 a inner join #temp b on a.id = b.id;. The temporary table will be marked for garbage collection when your session ends, so no need to tidy up after yourself.
If the table is larger then you might want to do the above, but also add an index to your temporary table, e.g. CREATE INDEX ix$temp ON #temp (id);. Note that if you use a named index then you will have issues if you run the same procedure twice simultaneously, as the index name won't be unique. This isn't a problem if the execution is always in series.
If you have a small number of ids that you want to include then OPENQUERY might be the way to go, e.g. SELECT * FROM OPENQUERY('server2', 'SELECT * FROM table2 WHERE id IN (''1'', ''2'')');. The advantage here is that you are now running the query on the remote server, so it's more likely to use a more efficient query plan.
The bottom line is that if you expect to be able to JOIN a remote and local table then you will always have some level of uncertainty; even if the query runs well one day, it might suddenly decide to run a LOT slower the following day. Small things, like adding a single row of data to the remote table, can completely change the way the query is executed.

Optimization of large SQL Select query in odbc environment, comparison operators

I'm currently designing a database application that executes SQL statements on a SQL Server linked to the PCs via ODBC drivers (SQL Native Client v10, Local Network, Network Latency >1ms, executed from withing a MS Access 2003 Environment).
I'm dealing with a peculiar select query that is executed often and has to iterate through an indexed table with about 1.5 million entries. Currently the query structure is this:
SELECT *
FROM table1
WHERE field1 = value1
AND field2 = value2
AND textfield1 LIKE '* value3 *'
AND (field3 = value3 OR field4 = value4 OR field5 = value5)
ORDER BY indexedField1 DESC
(Simplified for reading comprehension and understandability, the real query can have up to 4 bracketed AND connected OR blocks, and up to a total of 31 AND connected statements).
Currently this query takes about ~2s every time it gets executed. It returns somewhere between 1.000 and 15.000 records in usual production. I'm looking for a way to make it execute faster or to restructure it in a way to make it work faster.
Coworkers of mine have hinted at the fact that using LIKE operators might be performance inefficient and that restructuring the OR statements in brackets could bring additional performance.
Edit: Additional relevant information: the table that is being pulled from is VERY active, there is an entry roughly every 1-5 minutes into it.
So the final question is:
Given my parameters outlined above, is this version of the query the most simplistic I can get it.
Can I do something to otherwise speed up the query or the execution time thereof.
General query optimisation is beyond the scope of a single answer, though there may be some help to be had on http://dba.stackexchange.com. However, you should learn how to read a query plan and figure our your bottlenecks before you start optimising.
(The way I'd do that would be to take a few typical queries and look at their estimated execution plan through a tool like SQL Server Management Studio. You may have to try to dig out the real SQL Server query that's resulted from your Access query, which your DBA might be able to help with. I'm assuming that your Access query is actually being translated into a SQL Server query and run natively on the server; if it's not then that will be your big problem!)
I'm going to assume you've indexed every column used in every predicate in your WHERE clause and still have a problem. If that's the case, the suspect is likely to be:
AND textfield1 LIKE '* value3 *'
Because that can't use an index. (It's not SARGable, because it has a wildcard at the beginning, so an index won't be any help.)
If you can't rearrange your search or pre-calculate this particular predicate, then you basically have the problem that Full-Text Searching was designed to solve by tokenising and pre-indexing the words in your text, and that will probably be the best solution.

SQL UPDATE WHERE IN (List) or UPDATE each individually?

I'm doing my best lately to look for the best way to run certain queries in SQL that could potentially be done multiple different ways. Among my research I've come across quite a lot of hate for the WHERE IN concept, due to an inherent inefficiency in how it works.
eg: WHERE Col IN (val1, val2, val3)
In my current project, I'm doing an UPDATE on a large set of data and am wondering which of the following is more efficient: (or whether a better option exists)
UPDATE table1 SET somecolumn = 'someVal' WHERE ID IN (id1, id2, id3 ....);
In the above, the list of ID's can be up to 1.5k ID's.
VS
Looping through all ID's in code, and running the following statement for each:
UPDATE table1 SET somecolumn = 'someVal' WHERE ID = 'theID';
To myself, it seems more logical that the former would work better / faster, because there's less queries to run. That said, I'm not 100% familiar with the in's and out's of SQL and how query queueing works.
I'm also unsure as to which would be friendlier on the DB as far as table locks and other general performance.
General info in case it helps, I'm using Microsoft SQL Server 2014, and the primary development language is C#.
Any help is much appreciated.
EDIT:
Option 3:
UPDATE table1 SET somecolumn = 'someVal' WHERE ID IN (SELECT ID FROM #definedTable);
In the above, #definedTable is a SQL 'User Defined Table Type', where the data inside comes through to a stored procedure as (in C#) type SqlDbType.Structured
People are asking how the ID's come in:
ID's are in a List<string>in the code, and are used for other things in the code before then being sent to a stored procedure. Currently, the ID's are coming into the stored procedure as a 'User-Defined Table Type' with only one column (ID's).
I thought having them in a table might be better than having the code concatenate a massive string and just spitting it into the SP as a variable that looks like id1, id2, id3, id4 etc
I'm using your third option and it works great.
My stored procedure has a table-valued parameter. See also Use Table-Valued Parameters.
In the procedure there is one statement, no loops, like you said:
UPDATE table1 SET somecolumn = 'someVal' WHERE ID IN (SELECT ID FROM #definedTable);
It is better to call the procedure once, than 1,500 times. It is better to have one transaction, than 1,500 transactions.
If the number of rows in the #definedTable goes above, say, 10K, I'd consider splitting it in batches of 10K.
Your first variant is OK for few values in the IN clause, but when you get to really high numbers (60K+) you can see something like this, as shown in this answer:
Msg 8623, Level 16, State 1, Line 1 The query processor ran out of
internal resources and could not produce a query plan. This is a rare
event and only expected for extremely complex queries or queries that
reference a very large number of tables or partitions. Please simplify
the query. If you believe you have received this message in error,
contact Customer Support Services for more information.
Your first or third options are the best way to go. For either of them, you want an index on table1(id).
In general, it is better to run one query rather than multiple queries because the overhead of passing data in and out of the database adds up. In addition, each update starts a transactions and commits it -- more overhead. That said, this will probably not be important unless you are updating thousands of records. The overhead is measured in hundreds of microseconds or milliseconds, on a typical system.
You should definitely NOT use a loop and send an entire new SQL statement for each ID. In that case, the SQL engine has to recompile the SQL statement and come up with an execution plan, etc. every single time.
Probably the best thing to do is to make a prepared statement with a placeholder then loop through your data executing the statement for each value. Then the statement stays in the database engine's memory and it quickly just executes it with the new value each time you call it rather than start from scratch.
If you have a large database and/or run this often, also make sure you create an index on that ID value, otherwise it will have to do a full table scan with every value.
EDIT:
Perl pseudocode as described below:
#!/usr/bin/perl
use DBI;
$dbh = DBI->connect('dbi:Oracle:MY_DB', 'scott', 'tiger', { RaiseError => 1, PrintError =>1, AutoCommit => 0 });
$sth = $dbh->prepare ("UPDATE table1 SET somecolumn = ? WHERE id = ?");
foreach $tuple (#updatetuples) {
$sth->execute($$tuple[1], $$tuple[0]);
}
$dbh->commit;
$sth->finish;
$dbh->disconnect;
exit (0);
I came upon this post when trying to solve a very similar problem so thought I'd share what I found. My answer uses the case keyword, and applies to when you are trying to run an update for a list of key-value pairs (not when you are trying to update a bunch of rows to a single value). Normally I'd just run an update query and join the relevant tables, but I am using SQLite rather than MySQL and SQLite doesn't support joined update queries as well as MySQL. You can do something like this:
UPDATE mytable SET somefield=( CASE WHEN (id=100) THEN 'some value 1' WHEN (id=101) THEN 'some value 2' END ) WHERE id IN (100,101);

Weird SQL Server query problem

So I have this weird problem with an SQL Server stored procedure. Basically I have this long and complex procedure. Something like this:
SELECT Table1.col1, Table2.col2, col3
FROM Table1 INNER JOIN Table2
Table2 INNER JOIN Table3
-----------------------
-----------------------
(Lots more joins)
WHERE Table1.Col1 = dbo.fnGetSomeID() AND (More checks)
-----------------------
-----------------------
(6-7 More queries like this with the same check)
The problem is that check in the WHERE clause at the end Table1.Col1 = dbo.fnGetSomeID(). The function dbo.fnGetSomeID() returns a simple integer value 1. So when I hardcode the value 1 where the function call should be the SP takes only about 15 seconds. BUT when I replace it with that function call in the WHERE clause it takes around 3.5 minutes.
So I do this:
DECLARE #SomeValue INT
SET #SomeValue = dbo.fnGetSomeID()
--Where clause changed
WHERE Table1.Col1 = #SomeValue
So now the function is only called once. But still the same 3.5 minutes. So I go ahead and do this:
DECLARE #SomeValue INT
--Removed the function, replaced it with 1
SET #SomeValue = 1
--Where clause changed
WHERE Table1.Col1 = #SomeValue
And still it takes 3.5 minutes. Why the performance impact? And how to make it go away?
Even with #SomeValue set at 1, when you have
WHERE Table1.Col1 = #SomeValue
SQL Server probably still views #SomeValue as a variable, not as a hardcoded 1, and that would affect the query plan accordingly. And since Table1 is linked to Table2, and Table2 is linked to Table3, etc., the amount of time to run the query is magnified. On the other hand, when you have
WHERE Table1.Col1 = 1
The query plan gets locked in with Table1.Col1 at a constant value of 1. Just because we see
WHERE Table1.Col1 = #SomeValue
as 'hardcoding', doesn't mean SQL sees it the same way. Every possible cartesian product is a candidate and #SomeValue needs to be evaluated for each.
So, the standard recommendations apply - check your execution plan, rewrite the query if needed.
Also, are those join columns indexed?
As is mentioned elsewhere, there will be execution plan differences depending on which approach you take. I'd look at both execution plans to see if there's an obvious answer there.
This question described a similar problem, and the answer in that case turned out to involve connection settings.
I've also run into almost the exact same problem as this myself, and what I found out in that case was that using the newer constructs (analytic functions in SQL 2008) was apparently confusing the optimizer. This may not be the case for you, as you're using SQL 2005, but something similar might be going on depending on the rest of your query.
One other thing to look at is whether you have a biased distribution of values for Table1.Col1 -- if the optimizer is using a general execution plan when you use the function or the variable rather than the constant, that might lead it to choose suboptimal joins than when it can clearly see that the value is one specific constant.
If all else fails, and this query is not inside another UDF, you can precalculate the fnGetSomeID() UDF's value like you were doing, then wrap the whole query in dynamic SQL, providing the value as a constant in the SQL string. That should give you the faster performance, at the cost of recompiling the query every time (which should be a good trade in this case).
Another thing to try.
Instead of loading the id into a variable, load it into a table
if object_id('myTable') is not null drop myTable
select dbo.fnGetSomeID() as myID into myTable
and then use
WHERE Table1.Col1 = (select myID from myTable)
in your query.
You could try the OPTIMIZE FOR hint to force a plan for a given constant, but it may have inconsistent results; in 2008 you can use OPTIMIZE FOR UNKNOWN
I think that since the optimizer has no idea how much work the function does, it tries to evaluate them last.
I would try storing the return value of the function in a variable ahead of time, and using that in your where clause.
Also, you might want to try schema binding your function, because apparently sometimes it seriously affects peformance.
You can make your function schema bound like so:
create function fnGetSomeID()
with schema_binding
returns int
... etc.
(Lots more joins)
WHERE Table1.Col1 = dbo.fnGetSomeID() AND (More checks)
This is not a nice problem to have. It shouldn't matter, finally, whether the value is returned by a function or subquery or variable or is a constant. But it does, and at some level of complexity it's very hard to get consistent results. And you can't really debug it, because neither you nor anyone else here can peer inside the black box that is the query optimizer. All you can do is poke at it and see how it behaves.
I think the query optimizer is behaving erratically because there are many tables in the query. When you tell it to look for 1 it looks at the index statistics and makes a good choice. When you tell it anything else, it assumes it should join based on what it does
know, not trusting your function/variable to return a selective value. For that to true, Table1.Col1 must have an uneven distribution of values. Or the query optimizer is not, um, optimal.
Either way, the estimated query plan should show a difference. Look for opportunities to add (or, sometimes, remove) an index. It could be the 3.5 plan is reasonable in a lot of cases, and what the server really wants is better indexes.
Beyond that is guesswork. Sometimes, sad to say, the answer lies in finding the subset of tables that produce a small set of rows, putting them in a temporary table, and joining that to the rest of the tables. The OPTIMIZE FOR hint might be useful, too.
Keep in mind, though, that any solution you come with will be fragile, data and version dependent.

Favourite performance tuning tricks [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
When you have a query or stored procedure that needs performance tuning, what are some of the first things you try?
Here is the handy-dandy list of things I always give to someone asking me about optimisation.
We mainly use Sybase, but most of the advice will apply across the board.
SQL Server, for example, comes with a host of performance monitoring / tuning bits, but if you don't have anything like that (and maybe even if you do) then I would consider the following...
99% of problems I have seen are caused by putting too many tables in a join. The fix for this is to do half the join (with some of the tables) and cache the results in a temporary table. Then do the rest of the query joining on that temporary table.
Query Optimisation Checklist
Run UPDATE STATISTICS on the underlying tables
Many systems run this as a scheduled weekly job
Delete records from underlying tables (possibly archive the deleted records)
Consider doing this automatically once a day or once a week.
Rebuild Indexes
Rebuild Tables (bcp data out/in)
Dump / Reload the database (drastic, but might fix corruption)
Build new, more appropriate index
Run DBCC to see if there is possible corruption in the database
Locks / Deadlocks
Ensure no other processes running in database
Especially DBCC
Are you using row or page level locking?
Lock the tables exclusively before starting the query
Check that all processes are accessing tables in the same order
Are indices being used appropriately?
Joins will only use index if both expressions are exactly the same data type
Index will only be used if the first field(s) on the index are matched in the query
Are clustered indices used where appropriate?
range data
WHERE field between value1 and value2
Small Joins are Nice Joins
By default the optimiser will only consider the tables 4 at a time.
This means that in joins with more than 4 tables, it has a good chance of choosing a non-optimal query plan
Break up the Join
Can you break up the join?
Pre-select foreign keys into a temporary table
Do half the join and put results in a temporary table
Are you using the right kind of temporary table?
#temp tables may perform much better than #table variables with large volumes (thousands of rows).
Maintain Summary Tables
Build with triggers on the underlying tables
Build daily / hourly / etc.
Build ad-hoc
Build incrementally or teardown / rebuild
See what the query plan is with SET SHOWPLAN ON
See what’s actually happenning with SET STATS IO ON
Force an index using the pragma: (index: myindex)
Force the table order using SET FORCEPLAN ON
Parameter Sniffing:
Break Stored Procedure into 2
call proc2 from proc1
allows optimiser to choose index in proc2 if #parameter has been changed by proc1
Can you improve your hardware?
What time are you running? Is there a quieter time?
Is Replication Server (or other non-stop process) running? Can you suspend it? Run it eg. hourly?
Have a pretty good idea of the optimal path of running the query in your head.
Check the query plan - always.
Turn on STATS, so that you can examine both IO and CPU performance. Focus on driving those numbers down, not necessarily the query time (as that can be influenced by other activity, cache, etc.).
Look for large numbers of rows coming into an operator, but small numbers coming out. Usually, an index would help by limiting the number of rows coming in (which saves disk reads).
Focus on the largest cost subtree first. Changing that subtree can often change the entire query plan.
Common problems I've seen are:
If there's a lot of joins, sometimes Sql Server will choose to expand the joins, and then apply WHERE clauses. You can usually fix this by moving the WHERE conditions into the JOIN clause, or a derived table with the conditions inlined. Views can cause the same problems.
Suboptimal joins (LOOP vs HASH vs MERGE). My rule of thumb is to use a LOOP join when the top row has very few rows compared to the bottom, a MERGE when the sets are roughly equal and ordered, and a HASH for everything else. Adding a join hint will let you test your theory.
Parameter sniffing. If you ran the stored proc with unrealistic values at first (say, for testing), then the cached query plan may be suboptimal for your production values. Running again WITH RECOMPILE should verify this. For some stored procs, especially those that deal with varying sized ranges (say, all dates between today and yesterday - which would entail an INDEX SEEK - or, all dates between last year and this year - which would be better off with an INDEX SCAN) you may have to run it WITH RECOMPILE every time.
Bad indentation...Okay, so Sql Server doesn't have an issue with this - but I sure find it impossible to understand a query until I've fixed up the formatting.
Slightly off topic but if you have control over these issues...
High level and High Impact.
For high IO environments make sure your disks are for either RAID 10 or RAID 0+1 or some nested implementation of raid 1 and raid 0.
Don't use drives less than 1500K.
Make sure your disks are only used for your Database. IE no logging no OS.
Turn off auto grow or similar feature. Let the database use all storage that is anticipated. Not necessarily what is currently being used.
design your schema and indexes for the type queries.
if it's a log type table (insert only) and must be in the DB don't index it.
if your doing allot of reporting (complex selects with many joins) then you should look at creating a data warehouse with a star or snowflake schema.
Don't be afraid of replicating data in exchange for performance!
CREATE INDEX
Assure there are indexes available for your WHERE and JOIN clauses. This will speed data access greatly.
If your environment is a data mart or warehouse, indexes should abound for almost any conceivable query.
In a transactional environment, the number of indexes should be lower and their definitions more strategic so that index maintenance doesn't drag down resources. (Index maintenance is when the leaves of an index must be changed to reflect a change in the underlying table, as with INSERT, UPDATE, and DELETE operations.)
Also, be mindful of the order of fields in the index - the more selective (higher cardinality) a field, the earlier in the index it should appear. For example, say you're querying for used automobiles:
SELECT i.make, i.model, i.price
FROM dbo.inventory i
WHERE i.color = 'red'
AND i.price BETWEEN 15000 AND 18000
Price generally has higher cardinality. There may be only a few dozen colors available, but quite possibly thousands of different asking prices.
Of these index choices, idx01 provides the faster path to satisfy the query:
CREATE INDEX idx01 ON dbo.inventory (price, color)
CREATE INDEX idx02 ON dbo.inventory (color, price)
This is because fewer cars will satisfy the price point than the color choice, giving the query engine far less data to analyze.
I've been known to have two very similar indexes differing only in the field order to speed queries (firstname, lastname) in one and (lastname, firstname) in the other.
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible and try to eliminate file sorts. High Performance MySQL: Optimization, Backups, Replication, and More is a great book on this topic as is MySQL Performance Blog.
A trick I recently learned is that SQL Server can update local variables as well as fields, in an update statement.
UPDATE table
SET #variable = column = #variable + otherColumn
Or the more readable version:
UPDATE table
SET
#variable = #variable + otherColumn,
column = #variable
I've used this to replace complicated cursors/joins when implementing recursive calculations, and also gained a lot in performance.
Here's details and example code that made fantastic improvements in performance:
Link
#Terrapin there are a few other differences between isnull and coalesce that are worth mentioning (besides ANSI compliance, which is a big one for me).
Coalesce vs. IsNull
Sometimes in SQL Server if you use an OR in a where clause it will really jack with performance. Instead of using the OR just do two selects and union them together. You get the same results at 1000x the speed.
Look at the where clause - verify use of indexes / verify nothing silly is being done
where SomeComplicatedFunctionOf(table.Column) = #param --silly
I'll generally start with the joins - I'll knock each one of them out of the query one at a time and re-run the query to get an idea if there's a particular join I'm having a problem with.
On all of my temp tables, I like to add unique constraints (where appropriate) to make indexes, and primary keys (almost always).
declare #temp table(
RowID int not null identity(1,1) primary key,
SomeUniqueColumn varchar(25) not null,
SomeNotUniqueColumn varchar(50) null,
unique(SomeUniqueColumn)
)
#DavidM
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible...
In SQL Server, execution plan gets you the same thing - it tells you what indexes are being hit, etc.
Not necessarily a SQL performance trick per se but definately related:
A good idea would be to use memcached where possible as it would be much faster just fetching the precompiled data directly from memory rather than getting it from the database. There's also a flavour of MySQL that got memcached built in (third party).
Make sure your index lengths are as small as possible. This allows the DB to read more keys at a time from the file system, thus speeding up your joins. I assume this works with all DB's, but I know it's a specific recommendation for MySQL.
I've made it a habit to always use bind variables. It's possible bind variables won't help if the RDBMS doesn't cache SQL statements. But if you don't use bind variables the RDBMS doesn't have a chance to reuse query execution plans and parsed SQL statements. The savings can be enormous: http://www.akadia.com/services/ora_bind_variables.html. I work mostly with Oracle, but Microsoft SQL Server works pretty much the same way.
In my experience, if you don't know whether or not you are using bind variables, you probably aren't. If your application language doesn't support them, find one that does. Sometimes you can fix query A by using bind variables for query B.
After that, I talk to our DBA to find out what's causing the RDBMS the most pain. Note that you shouldn't ask "Why is this query slow?" That's like asking your doctor to take out you appendix. Sure your query might be the problem, but it's just as likely that something else is going wrong. As developers, we we tend to think in terms of lines of code. If a line is slow, fix that line. But a RDBMS is a really complicated system and your slow query might be the symptom of a much larger problem.
Way too many SQL tuning tips are cargo cult idols. Most of the time the problem is unrelated or minimally related to the syntax you use, so it's normally best to use the cleanest syntax you can. Then you can start looking at ways to tune the database (not the query). Only tweak the syntax when that fails.
Like any performance tuning, always collect meaningful statistics. Don't use wallclock time unless it's the user experience you are tuning. Instead look at things like CPU time, rows fetched and blocks read off of disk. Too often people optimize for the wrong thing.
First step:
Look at the Query Execution Plan!
TableScan -> bad
NestedLoop -> meh warning
TableScan behind a NestedLoop -> DOOM!
SET STATISTICS IO ON
SET STATISTICS TIME ON
Running the query using WITH (NoLock) is pretty much standard operation in my place. Anyone caught running queries on the tens-of-gigabytes tables without it is taken out and shot.
Convert NOT IN queries to LEFT OUTER JOINS if possible. For example if you want to find all rows in Table1 that are unused by a foreign key in Table2 you could do this:
SELECT *
FROM Table1
WHERE Table1.ID NOT IN (
SELECT Table1ID
FROM Table2)
But you get much better performance with this:
SELECT Table1.*
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.ID = Table2.Table1ID
WHERE Table2.ID is null
Index the table(s) by the clm(s) you filter by
Prefix all tables with dbo. to prevent recompilations.
View query plans and hunt for table/index scans.
In 2005, scour the management views for missing indexes.
I like to use
isnull(SomeColThatMayBeNull, '')
Over
coalesce(SomeColThatMayBeNull, '')
When I don't need the multiple argument support that coalesce gives you.
http://blog.falafel.com/2006/04/05/SQLServerArcanaISNULLVsCOALESCE.aspx
I look out for:
Unroll any CURSOR loops and convert into set based UPDATE / INSERT statements.
Look out for any application code that:
Calls an SP that returns a large set of records,
Then in the application, goes through each record and calls an SP with parameters to update records.
Convert this into a SP that does all the work in one transaction.
Any SP that does lots of string manipulation. It's evidence that the data is not structured correctly / normalised.
Any SP's that re-invent the wheel.
Any SP's that I can't understand what it's trying to do within a minute!
SET NOCOUNT ON
Usually the first line inside my stored procedures, unless I actually need to use ##ROWCOUNT.
In SQL Server, use the nolock directive. It allows the select command to complete without having to wait - usually other transactions to finish.
SELECT * FROM Orders (nolock) where UserName = 'momma'
Remove cursors wherever the are not neceesary.
Remove function calls in Sprocs where a lot of rows will call the function.
My colleague used function calls (getting lastlogindate from userid as example) to return very wide recordsets.
Tasked with optimisation, I replaced the function calls in the sproc with the function's code: I got many sprocs' running time down from > 20 seconds to < 1.
Don't prefix Stored Procedure names with "sp_" because system procedures all start with "sp_", and SQL Server will have to search harder to find your procedure when it gets called.
Dirty reads -
set transaction isolation level read uncommitted
Prevents dead locks where transactional integrity isn't absolutely necessary (which is usually true)
I always go to SQL Profiler (if it's a stored procedure with a lot of nesting levels) or the query execution planner (if it's a few SQL statements with no nesting) first. 90% of the time you can find the problem immediately with one of these two tools.