Speed Up Processing of SQL in Access 2007 - sql

I've been trying to get this SQL query I am using in an Access 2007 database to execute faster. I've already eliminated any distinct queries involved -- and Access doesn't really give me too much information on where the hang-up is. The query is pulling about 150,000 rows and is taking about 2 minutes to complete.
I'm still learning the syntax for sql in access, but I think I have it setup correctly. I'd appreciate any insights or hints on what I might be missing.
SELECT OLS_UNITS_GROSS_ACRES.AGMT_NUM,
MAX(OLS_UNITS_GROSS_ACRES.UNIT_GROSS_ACRES) AS UNIT_GROSS,
SUM(MIS_ACREAGES.ACRE_AMT) AS [RELATED ACRES]
FROM ((OLS_UNITS_GROSS_ACRES
INNER JOIN MIS_XREFERENCED_AGMTS_M ON OLS_UNITS_GROSS_ACRES.ARRG_KEY = MIS_XREFERENCED_AGMTS_M.ACTIVE_ARRG_KEY)
INNER JOIN ALL_AGMTS1 ON = MIS_XREFERENCED_AGMTS_M.RELATED_ARRG_KEY = ALL_AGMTS1.ARRG_KEY)
INNER JOIN MIS_ACREAGES ON ALL_AGMTS1.ARRG_KEY = MIS_ACREAGES.ARRG_KEY
WHERE (((ALL_AGMTS1.SUBJ_CODE)="LSE")
AND (((MIS_ACREAGES.ACRE_TYPE_CODE)="CNT")
OR ((MIS_ACREAGES.ACRE_TYPE_CODE)="STN")
OR ((MIS_ACREAGES.ACRE_TYPE_CODE)="DNT")))
GROUP BY OLS_UNITS_GROSS_ACRES.AGMT_NUM;

There are several things you could try. Eliminating any DISTINCT Queries is a good start. Another way to speed it up would be to create an index for fields that are frequently queried and or sorted. This will speed up the SELECT query, however it does slow down any INSERT commands as it now has to insert information into the index as well as the table.If your database design is stable and its objects will not be renamed, you can safely turn off Name AutoCorrect to improve performance.Over time, the performance of a database file can become slow because of space that remains allocated to deleted or temporary objects. The Compact and Repair command removes this wasted space and can help a database run faster and more efficiently. I got most of this from the following link. It had several other ideas, but those were the best ones; http://office.microsoft.com/en-us/access-help/help-access-run-faster-HA010235589.aspx

Related

Slow SQL Queries, Order Table by Date?

I have a Sql-Server-2008 database that I am querying from on the regular that was over 30 million entries (joy!). Unfortunately this database cannot be drastically changed because it is still in use for R/D.
When I query from this database, it takes FOREVER. By that I mean I haven't been patient enough to wait for results (after 2 mins I have to cancel to avoid locking the R/D department out). Even if I use a short date range (more than a few months), it is basically impossible to get any results from it. I am querying with requirements from 4 of the columns and unfortunately have to use an inner-join for another table (which I've been told is very costly in terms of query efficiency -- but it unavoidable). This inner joined table has less than 100k entries.
What I was wondering, is it is possible to organize the table to have it defaultly be ordered by date to reduce the number of results it has to search through?
If this is not possible, is there anything I can do to reduce query times? Is there any other useful information that could assist me in coming up with a solution?
I have included a sample of the query that I use:
SELECT DISTINCT N.TestName
FROM [DalsaTE].[dbo].[ResultsUut] U
INNER JOIN [DalsaTE].[dbo].[ResultsNumeric] N
ON N.ModeDescription = 'Mode 8: Low Gain - Green-Blue'
AND N.ResultsUutId = U.ResultsUutId
WHERE U.DeviceName = 'BO-32-3HK60-00-R'
AND U.StartDateTime > '2011-11-25 01:10:10.001'
ORDER BY N.TestName
Any help or suggestions are appreciated!
It sounds like datetime may be a text based field and subsequently an index isn't being used?
Could you try the following to see if you have any speed improvement:
select distinct N.TestName
from [DalsaTE].[dbo].[ResultsUut] U
inner join [DalsaTE].[dbo].[ResultsNumeric] N
on N.ModeDescription = 'Mode 8: Low Gain - Green-Blue'
and N.ResultsUutId = U.ResultsUutId
where U.DeviceName = 'BO-32-3HK60-00-R'
and U.StartDateTime > cast('2011-11-25 01:10:10.001' as datetime)
order by N.TestName
It would also be worth trying changing your inner join to a left outer join as those occasionally perform faster for no conceivable reason (at least one that I'm not aware of).
you can add an index based on your date column, which should improve your query time. You can either use an alter table command, or use the table designer.
Is the sole purpose of the join to provide sorting? If so, a quick thing to try would be to remove this, and see how much of a difference it makes - at least then you'll know where to focus your attention.
Finally, SQL server management studio has some useful tools such as execution plans that can help diagnose performance issues. Good luck!
There are a number of problems which may be causing delays in the execution of your query.
Indexes (except the primary key) do not reorder the data, they merely create an index (think phonebook) which orders a number of values and points back to the primary key.
Without seeing the type of data or the existing indexes, it's difficult, but at the very least, the following ASCENDING indexes might help:
[DalsaTE].[dbo].[ResultsNumeric] ModeDescription and ResultsUutId and TestName
[DalsaTE].[dbo].[ResultsUut] StartDateTime and DeviceName and ResultsUutId
Without the indexes above, the sample query you gave can be completed without performing a single lookup on the actual table data.

Oracle LEADING hint -- why is this required?

Suddenly (but unfortunately I don't know when "suddenly" was; I know it ran fine at some point in the past) one of my queries started taking 7+ seconds instead of milliseconds to execute. I have 1 local table and 3 tables being accessed via a DB link. The 3 remote tables are joined together, and one of them is joined with my local table.
The local table's where clause only takes a few millis to execute on its own, and only returns a few (10's or 100's at the most) records. The 3 remote tables have many hundreds of thousands, possibly millions, of records between them, and if I join them appropriately I get tens or hundreds of thousands of records.
I am only joining with the remote tables so that I can pull out a few pieces of data related to each record in my local table.
What appears to be happening, however, is that Oracle joins the remote tables together first and then my local table to that mess at the end. This is always going to be a bad idea, especially given the data set that exists right now, so I added a /*+ LEADING(local_tab remote_tab_1) */ hint to my query and it now returns in milliseconds.
I compared the explain plans and they are almost identical, save for a single BUFFER SORT on one of the remote tables.
I'm wondering what might cause Oracle to approach this the wrong way? Is it an index issue? What should I be looking for?
When choosing an execution plan, oracle estimates costs for the different plans. One crucial information for that estimate is the amount of rows will get returned from a step of the execution plan. Oracle tries to estimate those using 'statistics', i.e. information about how many rows a table contains, how many different values a column contains; How evenly these values are distributed.
These statistics are just that statistics, and they might be wrong, which is one of the most important reasons for misjudgments of the oracle optimizer.
So gathering new statistics as described in a comment might help. Have a look at the documentation on that dbms_stats package. There are many different ways to call that package.
A common problem I've come across is a query that joins many tables, where the joins form a chain from one end to another, e.g.:
SELECT *
FROM tableA, tableB, tableC, tableD, tableE
WHERE tableA.ID0 = :bind1
AND tableA.ID1 = tableB.ID1
AND tableB.ID2 = tableC.ID2
AND tableC.ID3 = tableD.ID3
AND tableD.ID4 = tableE.ID4
AND tableE.ID5 = :bind2;
Notice how the optimiser might choose to drive the query from tableA (e.g. if the index on ID0 is nicely selective) or from tableE (if the index on tableE.ID5 is more selective).
The statistics on the tables might cause the choice between these two plans to balance on a knife-edge; one day it's working fine (driving from tableA), next day new stats are gathered and all of a sudden the alternative plan driving from tableE has a lower cost and is chosen.
In this circumstance, adding a LEADING hint is one way to nudge it back to the original plan (i.e. drive from tableA) without dictating too much to the optimiser (i.e. it doesn't force the optimiser to choose any particular join methods).
You're doing distributed query optimization, and that's a tricky beast. It could be that the your table's statistics are current, but now the tables at the remote system are out-of-whack or have changed. Or the remote system added/removed/modified indexes, and that broke your plan. (This is an excellent reason to consider replication -- so you can control indexes and statistics against it.)
That said, Oracle's estimate of cardinality is a primary driver in execution plan. A 10053 trace analysis (Jonathan Lewis' Cost-Based Oracle Fundamentals book has wonderful examples from 8i to 10.1) can help shed light on why your statement's now broken and how the LEADING hint fixes it.
The DRIVING_SITE hint might be a better choice if you know you always want the local tables to be joined first before going after the remote site; it clarifies your intention without driving the plan the way a LEADING hint would.
Might not be relevant but I had a similar situation once where the remote table had been replaced by a single-table view. When it was a table the distributed query optimizer 'saw' that it had an index. When it became a view it couldn't see the index anymore and couldn't cost a plan that used an index on the remote object.
That was a few years ago. I documented my analysis at the time here.
RI,
It's hard to be sure about the cause of the performance problems without seeing the SQL.
When an Oracle query was performing well before, and suddenly starts performing badly, it is usually related to one of two issues:
A) Statistics are out of date. This is the easiest and quickest thing to check, even if you have a housekeeping batch process that's supposed to take care of it ... always double-check.
B) Data volume / data pattern change.
In your case, running a distributed query across multiple databases makes it 10x harder for Oracle to manage performance between them. Is it possible to put these tables in one database, perhaps separate schema owners in one database?
Hints are notoriously fragile, as Oracle is under no obligations to follow the hint. When the data volume or pattern changes some more, Oracle may just ignore the hint and do what it thinks is best (ie. worst ;-).
If you cannot put these tables all in one database, then I recommend you look to break your query up into two statements:
INSERT on sub-SELECT to copy external data to a global temporary table in your current database.
SELECT from the global temporary table to join with your other table.
You will have complete control over performance of step 1 above without resorting to hints. This approach typically scales well, providing you take time to do the performance tuning. I've seen this approach solve many complex performance problems.
The overhead for Oracle to create a whole new table, or insert a heap of records, is much smaller than most people expect. Defining a global temporary table further reduces that overhead.
Matthew

TSQL Join efficiency

I'm developing an ASP.NET/C#/SQL application. I've created a query for a specific grid-view that involves a lot of joins to get the data needed. On the hosted server, the query has randomly started taking up to 20 seconds to process. I'm sure it's partly an overloaded host-server (because sometimes the query takes <1s), but I don't think the query (which is actually a view reference via a stored procedure) is at all optimal regardless.
I'm unsure how to improve the efficiency of the below query:
(There are about 1500 matching records to those joins, currently)
SELECT dbo.ca_Connections.ID,
dbo.ca_Connections.Date,
dbo.ca_Connections.ElectricityID,
dbo.ca_Connections.NaturalGasID,
dbo.ca_Connections.LPGID,
dbo.ca_Connections.EndUserID,
dbo.ca_Addrs.LotNumber,
dbo.ca_Addrs.UnitNumber,
dbo.ca_Addrs.StreetNumber,
dbo.ca_Addrs.Street1,
dbo.ca_Addrs.Street2,
dbo.ca_Addrs.Suburb,
dbo.ca_Addrs.Postcode,
dbo.ca_Addrs.LevelNumber,
dbo.ca_CompanyConnectors.ConnectorID,
dbo.ca_CompanyConnectors.CompanyID,
dbo.ca_Connections.HandOverDate,
dbo.ca_Companies.Name,
dbo.ca_States.State,
CONVERT(nchar, dbo.ca_Connections.Date, 103) AS DateView,
CONVERT(nchar, dbo.ca_Connections.HandOverDate, 103) AS HandOverDateView
FROM dbo.ca_CompanyConnections
INNER JOIN dbo.ca_CompanyConnectors ON dbo.ca_CompanyConnections.CompanyID = dbo.ca_CompanyConnectors.CompanyID
INNER JOIN dbo.ca_Connections ON dbo.ca_CompanyConnections.ConnectionID = dbo.ca_Connections.ID
INNER JOIN dbo.ca_Addrs ON dbo.ca_Connections.AddressID = dbo.ca_Addrs.ID
INNER JOIN dbo.ca_Companies ON dbo.ca_CompanyConnectors.CompanyID = dbo.ca_Companies.ID
INNER JOIN dbo.ca_States ON dbo.ca_Addrs.StateID = dbo.ca_States.ID
It may have nothing to do with your query and everything to do with the data transfer.
How fast does the query run in query analyzer?
How does this compare to the web page?
If you are bringing back the entire data set you may want to introduce paging, say 100 records per page.
The first thing I normally suggest is to profile to look for potential indexes to help out. But the when the problem is sporadic like this and the normal case is for the query to run in <1sec, it's more likely due to lock contention rather than a missing index. That means the cause is something else in the system causing this query to take longer. Perhaps an insert or update. Perhaps another select query — one that you would normally expect to take a little longer so the extra time on it's end isn't noted.
I would start with indexing, but I have a database that is a third-party application. Creating my own indexes is not an option. I read an article (sorry, can't find the reference) recommending breaking up the query into table variables or temp tables (depending on number of records) when you have multiple tables in your query (not sure what the magic number is).
Start with dbo.ca_CompanyConnections, dbo.ca_CompanyConnectors, dbo.ca_Connections. Include the fields you need. And then subsitute these three joined tables with just the temp table.
Not sure what the issue is (would like to here recommendations) but seems like when you get over 5 tables performance seems to drop.

Need tips for optimizing SQL Query using a JOIN

The Query I'm writing runs fine when looking at the past few days, once I go over a week it crawls (~20min). I am joining 3 tables together. I was wondering what things I should look for to make this run faster. I don't really know what other information is needed for the post.
EDIT: More info: db is Sybase 10. Query:
SELECT a.id, a.date, a.time, a.signal, a.noise,
b.signal_strength, b.base_id, b.firmware,
a.site, b.active, a.table_key_id
FROM adminuser.station AS a
JOIN adminuser.base AS b
ON a.id = b.base_id
WHERE a.site = 1234 AND a.date >= '2009-03-20'
I also took out the 3rd JOIN and it still runs extremely slow. Should I try another JOIN method?
I don't know Sybase 10 that well, but try running that query for say 10-day period and then 10 times, for each day in a period respectively and compare times. If the time in the first case is much higher, you've probably hit the database cache limits.
The solution is than to simply run queries for shorter periods in a loop (in program, not SQL). It works especially well if table A is partitioned by date.
You can get a lot of information (assuming you're using MSSQL here) by running your query in SQL Server Management Studio with the Include Actual Execution Plan option set (in the Query menu).
This will show you a diagram of the steps that SQLServer performs in order to execute the query - with relative costs against each step.
The next step is to rework the query a little (try doing it a different way) then run the new version and the old version at the same time. You will get two execution plans, with relative costs not only against each step, but against the two versions of the query! So you can tell objectively if you are making progress.
I do this all the time when debugging/optimizing queries.
Make sure you have indexes on the foreign keys.
It sounds more like you have a memory leak or aren't closing database connections in your client code than that there's anything wrong with the query.
[edit]
Nevermind: you mean quering over a date range rather than the duration the server has been active. I'll leave this up to help others avoid the same confusion.
Also, it would help if you could post the sql query, even if you need to obfuscate it some first, and it's a good bet to check if there's an index on your date column and the number of records returned by the longer range.
You may want to look into using a PARTITION for the date ranges, if your DB supports it. I've heard this can help significantly.
Grab the book "Professional SQL Server 2005 Performance Tuning" its pretty great.
You didn't mention your database. If it's not SQL Server, the specifics of how to get the data might be different, but the advice is fundamentally the same.
Look at indexing, for sure, but the first thing to do is to follow Blorgbeard's advice and scan for execution plans using Management Studio (again, if you are running SQL Server).
What I'm guessing you'll see is that for small date ranges, the optimizer picks a reasonable query plan, but that when the date range is large, it picks something completely different, likely involving either table scans or index scans, and possibly joins that lead to very large temporary recordsets. The execution plan analyzer will reveal all of this.
A scan means that the optimizer thinks that grinding over the whole table or the whole index is cheaper for what you are trying to do than seeking specific values.
What you eventually want to do is get indexes and the syntax of your query set up such that you keep index seeks in the query plan for your query regardless of the date range, or, failing that, that the scans you require are filtered as well as you can manage to minimize temporary recordset size and thereby avoid excessive reads and I/O.
SELECT
a.id, a.date, a.time, a.signal, a.noise,a.site, b.active, a.table_key_id,
b.signal_strength, b.base_id, b.firmware
FROM
( SELECT * FROM adminuser.station
WHERE site = 1234 AND date >= '2009-03-20') AS a
JOIN
adminuser.base AS b
ON
a.id = b.base_id
Kind of rewrote the query, so as to first filter the desired rows then perform a join rather than perform a join then filter the result.
Rather than pulling * from the sub-query you can just select the columns you want, which might be little helpful.
May be this will of little help, in speeding things.
While this is valid in MySql, I am not sure of the sysbase syntax though.

Favourite performance tuning tricks [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
When you have a query or stored procedure that needs performance tuning, what are some of the first things you try?
Here is the handy-dandy list of things I always give to someone asking me about optimisation.
We mainly use Sybase, but most of the advice will apply across the board.
SQL Server, for example, comes with a host of performance monitoring / tuning bits, but if you don't have anything like that (and maybe even if you do) then I would consider the following...
99% of problems I have seen are caused by putting too many tables in a join. The fix for this is to do half the join (with some of the tables) and cache the results in a temporary table. Then do the rest of the query joining on that temporary table.
Query Optimisation Checklist
Run UPDATE STATISTICS on the underlying tables
Many systems run this as a scheduled weekly job
Delete records from underlying tables (possibly archive the deleted records)
Consider doing this automatically once a day or once a week.
Rebuild Indexes
Rebuild Tables (bcp data out/in)
Dump / Reload the database (drastic, but might fix corruption)
Build new, more appropriate index
Run DBCC to see if there is possible corruption in the database
Locks / Deadlocks
Ensure no other processes running in database
Especially DBCC
Are you using row or page level locking?
Lock the tables exclusively before starting the query
Check that all processes are accessing tables in the same order
Are indices being used appropriately?
Joins will only use index if both expressions are exactly the same data type
Index will only be used if the first field(s) on the index are matched in the query
Are clustered indices used where appropriate?
range data
WHERE field between value1 and value2
Small Joins are Nice Joins
By default the optimiser will only consider the tables 4 at a time.
This means that in joins with more than 4 tables, it has a good chance of choosing a non-optimal query plan
Break up the Join
Can you break up the join?
Pre-select foreign keys into a temporary table
Do half the join and put results in a temporary table
Are you using the right kind of temporary table?
#temp tables may perform much better than #table variables with large volumes (thousands of rows).
Maintain Summary Tables
Build with triggers on the underlying tables
Build daily / hourly / etc.
Build ad-hoc
Build incrementally or teardown / rebuild
See what the query plan is with SET SHOWPLAN ON
See what’s actually happenning with SET STATS IO ON
Force an index using the pragma: (index: myindex)
Force the table order using SET FORCEPLAN ON
Parameter Sniffing:
Break Stored Procedure into 2
call proc2 from proc1
allows optimiser to choose index in proc2 if #parameter has been changed by proc1
Can you improve your hardware?
What time are you running? Is there a quieter time?
Is Replication Server (or other non-stop process) running? Can you suspend it? Run it eg. hourly?
Have a pretty good idea of the optimal path of running the query in your head.
Check the query plan - always.
Turn on STATS, so that you can examine both IO and CPU performance. Focus on driving those numbers down, not necessarily the query time (as that can be influenced by other activity, cache, etc.).
Look for large numbers of rows coming into an operator, but small numbers coming out. Usually, an index would help by limiting the number of rows coming in (which saves disk reads).
Focus on the largest cost subtree first. Changing that subtree can often change the entire query plan.
Common problems I've seen are:
If there's a lot of joins, sometimes Sql Server will choose to expand the joins, and then apply WHERE clauses. You can usually fix this by moving the WHERE conditions into the JOIN clause, or a derived table with the conditions inlined. Views can cause the same problems.
Suboptimal joins (LOOP vs HASH vs MERGE). My rule of thumb is to use a LOOP join when the top row has very few rows compared to the bottom, a MERGE when the sets are roughly equal and ordered, and a HASH for everything else. Adding a join hint will let you test your theory.
Parameter sniffing. If you ran the stored proc with unrealistic values at first (say, for testing), then the cached query plan may be suboptimal for your production values. Running again WITH RECOMPILE should verify this. For some stored procs, especially those that deal with varying sized ranges (say, all dates between today and yesterday - which would entail an INDEX SEEK - or, all dates between last year and this year - which would be better off with an INDEX SCAN) you may have to run it WITH RECOMPILE every time.
Bad indentation...Okay, so Sql Server doesn't have an issue with this - but I sure find it impossible to understand a query until I've fixed up the formatting.
Slightly off topic but if you have control over these issues...
High level and High Impact.
For high IO environments make sure your disks are for either RAID 10 or RAID 0+1 or some nested implementation of raid 1 and raid 0.
Don't use drives less than 1500K.
Make sure your disks are only used for your Database. IE no logging no OS.
Turn off auto grow or similar feature. Let the database use all storage that is anticipated. Not necessarily what is currently being used.
design your schema and indexes for the type queries.
if it's a log type table (insert only) and must be in the DB don't index it.
if your doing allot of reporting (complex selects with many joins) then you should look at creating a data warehouse with a star or snowflake schema.
Don't be afraid of replicating data in exchange for performance!
CREATE INDEX
Assure there are indexes available for your WHERE and JOIN clauses. This will speed data access greatly.
If your environment is a data mart or warehouse, indexes should abound for almost any conceivable query.
In a transactional environment, the number of indexes should be lower and their definitions more strategic so that index maintenance doesn't drag down resources. (Index maintenance is when the leaves of an index must be changed to reflect a change in the underlying table, as with INSERT, UPDATE, and DELETE operations.)
Also, be mindful of the order of fields in the index - the more selective (higher cardinality) a field, the earlier in the index it should appear. For example, say you're querying for used automobiles:
SELECT i.make, i.model, i.price
FROM dbo.inventory i
WHERE i.color = 'red'
AND i.price BETWEEN 15000 AND 18000
Price generally has higher cardinality. There may be only a few dozen colors available, but quite possibly thousands of different asking prices.
Of these index choices, idx01 provides the faster path to satisfy the query:
CREATE INDEX idx01 ON dbo.inventory (price, color)
CREATE INDEX idx02 ON dbo.inventory (color, price)
This is because fewer cars will satisfy the price point than the color choice, giving the query engine far less data to analyze.
I've been known to have two very similar indexes differing only in the field order to speed queries (firstname, lastname) in one and (lastname, firstname) in the other.
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible and try to eliminate file sorts. High Performance MySQL: Optimization, Backups, Replication, and More is a great book on this topic as is MySQL Performance Blog.
A trick I recently learned is that SQL Server can update local variables as well as fields, in an update statement.
UPDATE table
SET #variable = column = #variable + otherColumn
Or the more readable version:
UPDATE table
SET
#variable = #variable + otherColumn,
column = #variable
I've used this to replace complicated cursors/joins when implementing recursive calculations, and also gained a lot in performance.
Here's details and example code that made fantastic improvements in performance:
Link
#Terrapin there are a few other differences between isnull and coalesce that are worth mentioning (besides ANSI compliance, which is a big one for me).
Coalesce vs. IsNull
Sometimes in SQL Server if you use an OR in a where clause it will really jack with performance. Instead of using the OR just do two selects and union them together. You get the same results at 1000x the speed.
Look at the where clause - verify use of indexes / verify nothing silly is being done
where SomeComplicatedFunctionOf(table.Column) = #param --silly
I'll generally start with the joins - I'll knock each one of them out of the query one at a time and re-run the query to get an idea if there's a particular join I'm having a problem with.
On all of my temp tables, I like to add unique constraints (where appropriate) to make indexes, and primary keys (almost always).
declare #temp table(
RowID int not null identity(1,1) primary key,
SomeUniqueColumn varchar(25) not null,
SomeNotUniqueColumn varchar(50) null,
unique(SomeUniqueColumn)
)
#DavidM
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible...
In SQL Server, execution plan gets you the same thing - it tells you what indexes are being hit, etc.
Not necessarily a SQL performance trick per se but definately related:
A good idea would be to use memcached where possible as it would be much faster just fetching the precompiled data directly from memory rather than getting it from the database. There's also a flavour of MySQL that got memcached built in (third party).
Make sure your index lengths are as small as possible. This allows the DB to read more keys at a time from the file system, thus speeding up your joins. I assume this works with all DB's, but I know it's a specific recommendation for MySQL.
I've made it a habit to always use bind variables. It's possible bind variables won't help if the RDBMS doesn't cache SQL statements. But if you don't use bind variables the RDBMS doesn't have a chance to reuse query execution plans and parsed SQL statements. The savings can be enormous: http://www.akadia.com/services/ora_bind_variables.html. I work mostly with Oracle, but Microsoft SQL Server works pretty much the same way.
In my experience, if you don't know whether or not you are using bind variables, you probably aren't. If your application language doesn't support them, find one that does. Sometimes you can fix query A by using bind variables for query B.
After that, I talk to our DBA to find out what's causing the RDBMS the most pain. Note that you shouldn't ask "Why is this query slow?" That's like asking your doctor to take out you appendix. Sure your query might be the problem, but it's just as likely that something else is going wrong. As developers, we we tend to think in terms of lines of code. If a line is slow, fix that line. But a RDBMS is a really complicated system and your slow query might be the symptom of a much larger problem.
Way too many SQL tuning tips are cargo cult idols. Most of the time the problem is unrelated or minimally related to the syntax you use, so it's normally best to use the cleanest syntax you can. Then you can start looking at ways to tune the database (not the query). Only tweak the syntax when that fails.
Like any performance tuning, always collect meaningful statistics. Don't use wallclock time unless it's the user experience you are tuning. Instead look at things like CPU time, rows fetched and blocks read off of disk. Too often people optimize for the wrong thing.
First step:
Look at the Query Execution Plan!
TableScan -> bad
NestedLoop -> meh warning
TableScan behind a NestedLoop -> DOOM!
SET STATISTICS IO ON
SET STATISTICS TIME ON
Running the query using WITH (NoLock) is pretty much standard operation in my place. Anyone caught running queries on the tens-of-gigabytes tables without it is taken out and shot.
Convert NOT IN queries to LEFT OUTER JOINS if possible. For example if you want to find all rows in Table1 that are unused by a foreign key in Table2 you could do this:
SELECT *
FROM Table1
WHERE Table1.ID NOT IN (
SELECT Table1ID
FROM Table2)
But you get much better performance with this:
SELECT Table1.*
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.ID = Table2.Table1ID
WHERE Table2.ID is null
Index the table(s) by the clm(s) you filter by
Prefix all tables with dbo. to prevent recompilations.
View query plans and hunt for table/index scans.
In 2005, scour the management views for missing indexes.
I like to use
isnull(SomeColThatMayBeNull, '')
Over
coalesce(SomeColThatMayBeNull, '')
When I don't need the multiple argument support that coalesce gives you.
http://blog.falafel.com/2006/04/05/SQLServerArcanaISNULLVsCOALESCE.aspx
I look out for:
Unroll any CURSOR loops and convert into set based UPDATE / INSERT statements.
Look out for any application code that:
Calls an SP that returns a large set of records,
Then in the application, goes through each record and calls an SP with parameters to update records.
Convert this into a SP that does all the work in one transaction.
Any SP that does lots of string manipulation. It's evidence that the data is not structured correctly / normalised.
Any SP's that re-invent the wheel.
Any SP's that I can't understand what it's trying to do within a minute!
SET NOCOUNT ON
Usually the first line inside my stored procedures, unless I actually need to use ##ROWCOUNT.
In SQL Server, use the nolock directive. It allows the select command to complete without having to wait - usually other transactions to finish.
SELECT * FROM Orders (nolock) where UserName = 'momma'
Remove cursors wherever the are not neceesary.
Remove function calls in Sprocs where a lot of rows will call the function.
My colleague used function calls (getting lastlogindate from userid as example) to return very wide recordsets.
Tasked with optimisation, I replaced the function calls in the sproc with the function's code: I got many sprocs' running time down from > 20 seconds to < 1.
Don't prefix Stored Procedure names with "sp_" because system procedures all start with "sp_", and SQL Server will have to search harder to find your procedure when it gets called.
Dirty reads -
set transaction isolation level read uncommitted
Prevents dead locks where transactional integrity isn't absolutely necessary (which is usually true)
I always go to SQL Profiler (if it's a stored procedure with a lot of nesting levels) or the query execution planner (if it's a few SQL statements with no nesting) first. 90% of the time you can find the problem immediately with one of these two tools.