Related
I have not found any suitable way to show query plan other than image, so i added image. in image i got the execution plan and i want to reduce fullouter join cost
, if any one suggest me the way of reducing cost it would be great for better query plan link
WITH cte AS
(
SELECT
coalesce(fact_connect_hours.dimProviderId,fact_connect_hour_hum_shifts.dimProviderId,fact_connect_hour_clock_times.dimProviderId)
as dimProviderId,
coalesce(fact_connect_hours.dimScribeId,fact_connect_hour_hum_shifts.dimScribeId,fact_connect_hour_clock_times.dimScribeId)
as dimScribeId
,coalesce(fact_connect_hours.dimDateId,fact_connect_hour_hum_shifts.dimDateId,fact_connect_hour_clock_times.dimDateId)
as dimDateId
,factConnectHourId
,totalProviderLogTime
,providerFirstJoinTime
,providerLastEndTime
,scribeFirstLogin
,scribeLastLogout
,totalScribeLogTime
, totalScopeTime
, totalStreamTime
, firstScopeJoinTime
, lastScopeEndTime
, scopeLastActivityTime
, firstStreamJoinTime
, lastStreamEndTime
, streamLastActivityTime
,fact_connect_hour_hum_shifts.shiftStartTime
,fact_connect_hour_hum_shifts.shiftEndTime
,fact_connect_hour_hum_shifts.totalShiftTime
,fact_connect_hour_clock_times.ClockStartTimestamp
,fact_connect_hour_clock_times.ClockEndTimestamp
,fact_connect_hour_clock_times.totalClockTime
,fact_connect_hour_hum_shifts.shiftTitle
,fact_connect_hours.dimStatusId
,dim_status.status
FROM fact_connect_hours
INNER JOIN dim_status on fact_connect_hours.dimStatusId=dim_status.dimStatusId
full outer JOIN fact_connect_hour_hum_shifts
ON ( fact_connect_hour_hum_shifts.dimDateId=fact_connect_hours.dimDateId
and fact_connect_hour_hum_shifts.dimProviderId=fact_connect_hours.dimProviderId
and fact_connect_hour_hum_shifts.dimScribeId=fact_connect_hours.dimScribeId)
full outer join fact_connect_hour_clock_times
on (fact_connect_hours.dimDateId = fact_connect_hour_clock_times.dimDateId
and fact_connect_hours.dimProviderId= fact_connect_hour_clock_times.dimProviderId
and fact_connect_hours.dimScribeId = fact_connect_hour_clock_times.dimScribeId
)
WHERE coalesce(fact_connect_hours.dimDateId,fact_connect_hour_hum_shifts.dimDateId,fact_connect_hour_clock_times.dimDateId)>=732
) SELECT cte.*
,dim_date.tranDate
,dim_date.tranMonth
,dim_date.tranMonthName
,dim_date.tranYear
,dim_date.tranWeek
,dim_scribe.scribeUId
,dim_scribe.scribeFirstname
,dim_scribe.scribeFullname
,dim_scribe.scribeLastname
,dim_scribe.location
,dim_scribe.partner
,dim_scribe.beta
,dim_scribe.currentStatus
,dim_scribe.scribeEmail
,dim_scribe.augmedixEmail
,dim_scribe.partner
,dim_provider.scribeManager
,dim_provider.clinicalAccountManagerName
,dim_provider.providerUId
,dim_provider.beta
,dim_provider.accountName
,dim_provider.accountGroup
,dim_provider.accountType
,dim_provider.goLiveDate
,dim_provider.siteName
,dim_provider.churnDate
,dim_provider.providerFullname
,dim_provider.providerEmail
from cte
INNER JOIN dim_date on cte.dimDateId=dim_date.dimDateId
inner JOIN aug_bi_dw.dbo.dim_provider AS dim_provider on cte.dimProviderId=dim_provider.dimProviderId
inner join aug_bi_dw.dbo.dim_scribe AS dim_scribe on cte.dimScribeId=dim_scribe.dimScribeId
where dim_date.dimDateId>=732
Based on the table names (dim* and fact*), I'll assume you are doing a report of sorts over a data warehouse schema. Assuming this is the case, then likely the best thing you can do to improve performance is to consider using Columnstore indexes (and batch mode execution which is implicit once you enable Columnstores). These indexes are heavily compressed and often give significant performance gains on IO-bound workloads. Fact tables are the usual candidates as they are largest/often don't fit in the buffer pool.
Columnstores are supported in all editions in SQL 2016 onwards and go faster in Enterprise Edition (more parallelism, faster operations internally like using SIMD instructions, etc.). Please note that they don't directly support primary keys, so this may impact how you lay out the tables a bit. You can create keys (as b-tree secondary indexes internally), so some of the space savings are lost if you use primary keys. Often, fact tables + columnstores also use partitioning to get another layer of filtering without secondary indexes.
Please consider trying your query again with columnstores replacing the fact tables (perhaps on a copy of your database to do an experiment). When you look at the query plans that result, I suggest that you also look to see if the operators are running in batch mode. Batch mode operators are different than their row mode counterparts. The batch mode ones are optimized for the architectures of modern CPUs to minimize the amount of memory traffic in and out of the CPU. As a rough rule-of-thumb, 10x-100x difference is possible with columnstores + batch mode.
The only filter that can help you is 'where dim_date.dimDateId>=X'
And this comes up with a join to the cte and the cte field is composed out of 3 tables outer join themselves. For best performance i would choose to tell sql what to do step by step else it is very risky to perform with the best plan as is:
use 3 statements at tables fact_connect_hours, fact_connect_hour_hum_shifts and fact_connect_hour_clock_times using the filter and get results (just primary keys or all columns needed) into 3 temps like #fact_connect_hours, #fact_connect_hour_hum_shifts and #fact_connect_hour_clock_times
Use the statement as is but replace with the temps or use temps join real tables if temps only have the PK
Add indexes (if not present already) to columns fact_connect_hours.dimDateId, fact_connect_hour_hum_shifts.dimDateId and fact_connect_hour_clock_times.dimDateId
This way you make sure you filter directly what you need at the simplest steps possible, then the complicated query will work at a preset number of rows thus performance is guaranteed as very good vs very bad plan applied on a few rows is practically non important.
Lesser detail: Pay attention to 'INNER JOIN dim_status' - if there is no FK constraint the cardinality estimator may miss estimated returned rows as been unable to understand the relation between the tables.
I can also see an attempt for optimization as the filter had ascend upwards into the cte. This is a similar plan to what i propose at a lesser restriction. Using my plan will enforce rows searches to the core-root source.
I'm a database noobie when it comes to even moderately large data sets. I have a SQL database (multiple sql databases actually, a SQLite, Postgres, and MySQL database) all containing the same data, dumped from IMDB. I want to benchmark these different databases. The main table I want to query has about 15 million rows. I want a query that crosses two movies, right now my query looks like this
SELECT * from acted_in INNER JOIN actors
ON acted_in.idactors = actors.idactors WHERE
(acted_in.idmovies = %d OR acted_in.idmovies = %d)
the parameters are randomly generated ids. I want to test the relative speed of the databases by running this query multiple times for randomly generated movies and seeing the amount of time it takes on average. My question is, is there any better way to do the same query, I want to join who acted in what with their information from either of the two movies as this will be the core functionality for the project I am working on, right now the speed is abysmal currently the average speed for a single query is
sqlite: 7.160171360969543
postgres: 8.263306670188904
mysql: 13.27652293920517
This is average time per query (sample space of only 100 queries but it is significant enough for now). So can I do any better? The current run time is completely unacceptable for any practical use. I don't think the joining takes a lot of time, by removing it I get nearly the same results so I believe the lookup is what is taking a long time, as I don't gain a significant speed up when I don't join or look up using the OR conditional.
The thing you don't mention here is having any indexes in the databases. Generally, the way you speed up a query (except for terribly written ones, which this is not) is by adding indexes to the things which are used in join or where criteria. This would slow down updates since the indexes need to be updated any time the table is updated, but would speed up selections using those attributes quite substantially. You may wish to consider adding indexes to any attributes you use which are not already primary keys. Be sure to use the same index type in all databases to be fair.
First off, microbenchmarks on databases are pretty noninformative and it's not a good idea to base your decision on them. There are dozens of better criteria to select a db, like reliability, behavior under heavy loads, availability of certain features (eg an extensible language like the PostGIS extension for postgres, partitioning, ...), license (!!), and so on.
Second, if you want to tune your database, or database server, there's a number of things you have to consider. Some important ones:
db's like lots of memory and fast disks, so setup your server with ample quantities of both.
use the query analysis features offered by all major db's (eg the very visual explain feature in pgadmin for postgres) to analyze the behavior of queries that are important for your use case, and adapt the db based on what you learn from these analyses (eg extra or other indexes)
study to understand your db server well, these are pretty sophisticated programs with lots of settings that influence their behavior and performance
make sure you understand the workload your db is subjected to, eg by using a tool like pgfouine for postgres, others exist for other brands of databases.
Suddenly (but unfortunately I don't know when "suddenly" was; I know it ran fine at some point in the past) one of my queries started taking 7+ seconds instead of milliseconds to execute. I have 1 local table and 3 tables being accessed via a DB link. The 3 remote tables are joined together, and one of them is joined with my local table.
The local table's where clause only takes a few millis to execute on its own, and only returns a few (10's or 100's at the most) records. The 3 remote tables have many hundreds of thousands, possibly millions, of records between them, and if I join them appropriately I get tens or hundreds of thousands of records.
I am only joining with the remote tables so that I can pull out a few pieces of data related to each record in my local table.
What appears to be happening, however, is that Oracle joins the remote tables together first and then my local table to that mess at the end. This is always going to be a bad idea, especially given the data set that exists right now, so I added a /*+ LEADING(local_tab remote_tab_1) */ hint to my query and it now returns in milliseconds.
I compared the explain plans and they are almost identical, save for a single BUFFER SORT on one of the remote tables.
I'm wondering what might cause Oracle to approach this the wrong way? Is it an index issue? What should I be looking for?
When choosing an execution plan, oracle estimates costs for the different plans. One crucial information for that estimate is the amount of rows will get returned from a step of the execution plan. Oracle tries to estimate those using 'statistics', i.e. information about how many rows a table contains, how many different values a column contains; How evenly these values are distributed.
These statistics are just that statistics, and they might be wrong, which is one of the most important reasons for misjudgments of the oracle optimizer.
So gathering new statistics as described in a comment might help. Have a look at the documentation on that dbms_stats package. There are many different ways to call that package.
A common problem I've come across is a query that joins many tables, where the joins form a chain from one end to another, e.g.:
SELECT *
FROM tableA, tableB, tableC, tableD, tableE
WHERE tableA.ID0 = :bind1
AND tableA.ID1 = tableB.ID1
AND tableB.ID2 = tableC.ID2
AND tableC.ID3 = tableD.ID3
AND tableD.ID4 = tableE.ID4
AND tableE.ID5 = :bind2;
Notice how the optimiser might choose to drive the query from tableA (e.g. if the index on ID0 is nicely selective) or from tableE (if the index on tableE.ID5 is more selective).
The statistics on the tables might cause the choice between these two plans to balance on a knife-edge; one day it's working fine (driving from tableA), next day new stats are gathered and all of a sudden the alternative plan driving from tableE has a lower cost and is chosen.
In this circumstance, adding a LEADING hint is one way to nudge it back to the original plan (i.e. drive from tableA) without dictating too much to the optimiser (i.e. it doesn't force the optimiser to choose any particular join methods).
You're doing distributed query optimization, and that's a tricky beast. It could be that the your table's statistics are current, but now the tables at the remote system are out-of-whack or have changed. Or the remote system added/removed/modified indexes, and that broke your plan. (This is an excellent reason to consider replication -- so you can control indexes and statistics against it.)
That said, Oracle's estimate of cardinality is a primary driver in execution plan. A 10053 trace analysis (Jonathan Lewis' Cost-Based Oracle Fundamentals book has wonderful examples from 8i to 10.1) can help shed light on why your statement's now broken and how the LEADING hint fixes it.
The DRIVING_SITE hint might be a better choice if you know you always want the local tables to be joined first before going after the remote site; it clarifies your intention without driving the plan the way a LEADING hint would.
Might not be relevant but I had a similar situation once where the remote table had been replaced by a single-table view. When it was a table the distributed query optimizer 'saw' that it had an index. When it became a view it couldn't see the index anymore and couldn't cost a plan that used an index on the remote object.
That was a few years ago. I documented my analysis at the time here.
RI,
It's hard to be sure about the cause of the performance problems without seeing the SQL.
When an Oracle query was performing well before, and suddenly starts performing badly, it is usually related to one of two issues:
A) Statistics are out of date. This is the easiest and quickest thing to check, even if you have a housekeeping batch process that's supposed to take care of it ... always double-check.
B) Data volume / data pattern change.
In your case, running a distributed query across multiple databases makes it 10x harder for Oracle to manage performance between them. Is it possible to put these tables in one database, perhaps separate schema owners in one database?
Hints are notoriously fragile, as Oracle is under no obligations to follow the hint. When the data volume or pattern changes some more, Oracle may just ignore the hint and do what it thinks is best (ie. worst ;-).
If you cannot put these tables all in one database, then I recommend you look to break your query up into two statements:
INSERT on sub-SELECT to copy external data to a global temporary table in your current database.
SELECT from the global temporary table to join with your other table.
You will have complete control over performance of step 1 above without resorting to hints. This approach typically scales well, providing you take time to do the performance tuning. I've seen this approach solve many complex performance problems.
The overhead for Oracle to create a whole new table, or insert a heap of records, is much smaller than most people expect. Defining a global temporary table further reduces that overhead.
Matthew
What techniques can be applied effectively to improve the performance of SQL queries? Are there any general rules that apply?
Use primary keys
Avoid select *
Be as specific as you can when building your conditional statements
De-normalisation can often be more efficient
Table variables and temporary tables (where available) will often be better than using a large source table
Partitioned views
Employ indices and constraints
Learn what's really going on under the hood - you should be able to understand the following concepts in detail:
Indexes (not just what they are but actually how they work).
Clustered indexes vs heap allocated tables.
Text and binary lookups and when they can be in-lined.
Fill factor.
How records are ghosted for update/delete.
When page splits happen and why.
Statistics, and how they effect various query speeds.
The query planner, and how it works for your specific database (for instance on some systems "select *" is slow, on modern MS-Sql DBs the planner can handle it).
The biggest thing you can do is to look for table scans in sql server query analyzer (make sure you turn on "show execution plan"). Otherwise there are a myriad of articles at MSDN and elsewhere that will give good advice.
As an aside, when I started learning to optimize queries I ran sql server query profiler against a trace, looked at the generated SQL, and tried to figure out why that was an improvement. Query profiler is far from optimal, but it's a decent start.
There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented. Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005
Use a with statment to handle query filtering.
Limit each subquery to the minimum number of rows possible.
then join the subqueries.
WITH
master AS
(
SELECT SSN, FIRST_NAME, LAST_NAME
FROM MASTER_SSN
WHERE STATE = 'PA' AND
GENDER = 'M'
),
taxReturns AS
(
SELECT SSN, RETURN_ID, GROSS_PAY
FROM MASTER_RETURNS
WHERE YEAR < 2003 AND
YEAR > 2000
)
SELECT *
FROM master,
taxReturns
WHERE master.ssn = taxReturns.ssn
A subqueries within a with statement may end up as being the same as inline views,
or automatically generated temp tables. I find in the work I do, retail data, that about 70-80% of the time, there is a performance benefit.
100% of the time, there is a maintenance benefit.
I think using SQL query analyzer would be a good start.
In Oracle you can look at the explain plan to compare variations on your query
Make sure that you have the right indexes on the table. if you frequently use a column as a way to order or limit your dataset an index can make a big difference. I saw in a recent article that select distinct can really slow down a query, especially if you have no index.
The obvious optimization for SELECT queries is ensuring you have indexes on columns used for joins or in WHERE clauses.
Since adding indexes can slow down data writes you do need to monitor performance to ensure you don't kill the DB's write performance, but that's where using a good query analysis tool can help you balanace things accordingly.
Indexes
Statistics
on microsoft stack, Database Engine Tuning Advisor
Some other points (Mine are based on SQL server, since each db backend has it's own implementations they may or may not hold true for all databases):
Avoid correlated subqueries in the select part of a statement, they are essentially cursors.
Design your tables to use the correct datatypes to avoid having to apply functions on them to get the data out. It is far harder to do date math when you store your data as varchar for instance.
If you find that you are frequently doing joins that have functions in them, then you need to think about redesigning your tables.
If your WHERE or JOIN conditions include OR statements (which are slower) you may get better speed using a UNION statement.
UNION ALL is faster than UNION if (And only if) the two statments are mutually exclusive and return the same results either way.
NOT EXISTS is usually faster than NOT IN or using a left join with a WHERE clause of ID = null
In an UPDATE query add a WHERE condition to make sure you are not updating values that are already equal. The difference between updating 10,000,000 records and 4 can be quite significant!
Consider pre-calculating some values if you will be querying them frequently or for large reports. A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report. Pre-calculations should be done in triggers so that they are always up-to-date is the underlying data changes. And it doesn't have to be just numbers either, we havea calculated field that concatenates names that we use in reports.
Be wary of scalar UDFs, they can be slower than putting the code in line.
Temp table tend to be faster for large data set and table variables faster for small ones. In addition you can index temp tables.
Formatting is usually faster in the user interface than in SQL.
Do not return more data than you actually need.
This one seems obvious but you would not believe how often I end up fixing this. Do not join to tables that you are not using to filter the records or actually calling one of the fields in the select part of the statement. Unnecessary joins can be very expensive.
It is an very bad idea to create views that call other views that call other views. You may find you are joining to the same table 6 times when you only need to once and creating 100,000,00 records in an underlying view in order to get the 6 that are in your final result.
In designing a database, think about reporting not just the user interface to enter data. Data is useless if it is not used, so think about how it will be used after it is in the database and how that data will be maintained or audited. That will often change the design. (This is one reason why it is a poor idea to let an ORM design your tables, it is only thinking about one use case for the data.) The most complex queries affecting the most data are in reporting, so designing changes to help reporting can speed up queries (and simplify them) considerably.
Database-specific implementations of features can be faster than using standard SQL (That's one of the ways they sell their product), so get to know your database features and find out which are faster.
And because it can't be said too often, use indexes correctly, not too many or too few. And make your WHERE clauses sargable (Able to use indexes).
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
When you have a query or stored procedure that needs performance tuning, what are some of the first things you try?
Here is the handy-dandy list of things I always give to someone asking me about optimisation.
We mainly use Sybase, but most of the advice will apply across the board.
SQL Server, for example, comes with a host of performance monitoring / tuning bits, but if you don't have anything like that (and maybe even if you do) then I would consider the following...
99% of problems I have seen are caused by putting too many tables in a join. The fix for this is to do half the join (with some of the tables) and cache the results in a temporary table. Then do the rest of the query joining on that temporary table.
Query Optimisation Checklist
Run UPDATE STATISTICS on the underlying tables
Many systems run this as a scheduled weekly job
Delete records from underlying tables (possibly archive the deleted records)
Consider doing this automatically once a day or once a week.
Rebuild Indexes
Rebuild Tables (bcp data out/in)
Dump / Reload the database (drastic, but might fix corruption)
Build new, more appropriate index
Run DBCC to see if there is possible corruption in the database
Locks / Deadlocks
Ensure no other processes running in database
Especially DBCC
Are you using row or page level locking?
Lock the tables exclusively before starting the query
Check that all processes are accessing tables in the same order
Are indices being used appropriately?
Joins will only use index if both expressions are exactly the same data type
Index will only be used if the first field(s) on the index are matched in the query
Are clustered indices used where appropriate?
range data
WHERE field between value1 and value2
Small Joins are Nice Joins
By default the optimiser will only consider the tables 4 at a time.
This means that in joins with more than 4 tables, it has a good chance of choosing a non-optimal query plan
Break up the Join
Can you break up the join?
Pre-select foreign keys into a temporary table
Do half the join and put results in a temporary table
Are you using the right kind of temporary table?
#temp tables may perform much better than #table variables with large volumes (thousands of rows).
Maintain Summary Tables
Build with triggers on the underlying tables
Build daily / hourly / etc.
Build ad-hoc
Build incrementally or teardown / rebuild
See what the query plan is with SET SHOWPLAN ON
See what’s actually happenning with SET STATS IO ON
Force an index using the pragma: (index: myindex)
Force the table order using SET FORCEPLAN ON
Parameter Sniffing:
Break Stored Procedure into 2
call proc2 from proc1
allows optimiser to choose index in proc2 if #parameter has been changed by proc1
Can you improve your hardware?
What time are you running? Is there a quieter time?
Is Replication Server (or other non-stop process) running? Can you suspend it? Run it eg. hourly?
Have a pretty good idea of the optimal path of running the query in your head.
Check the query plan - always.
Turn on STATS, so that you can examine both IO and CPU performance. Focus on driving those numbers down, not necessarily the query time (as that can be influenced by other activity, cache, etc.).
Look for large numbers of rows coming into an operator, but small numbers coming out. Usually, an index would help by limiting the number of rows coming in (which saves disk reads).
Focus on the largest cost subtree first. Changing that subtree can often change the entire query plan.
Common problems I've seen are:
If there's a lot of joins, sometimes Sql Server will choose to expand the joins, and then apply WHERE clauses. You can usually fix this by moving the WHERE conditions into the JOIN clause, or a derived table with the conditions inlined. Views can cause the same problems.
Suboptimal joins (LOOP vs HASH vs MERGE). My rule of thumb is to use a LOOP join when the top row has very few rows compared to the bottom, a MERGE when the sets are roughly equal and ordered, and a HASH for everything else. Adding a join hint will let you test your theory.
Parameter sniffing. If you ran the stored proc with unrealistic values at first (say, for testing), then the cached query plan may be suboptimal for your production values. Running again WITH RECOMPILE should verify this. For some stored procs, especially those that deal with varying sized ranges (say, all dates between today and yesterday - which would entail an INDEX SEEK - or, all dates between last year and this year - which would be better off with an INDEX SCAN) you may have to run it WITH RECOMPILE every time.
Bad indentation...Okay, so Sql Server doesn't have an issue with this - but I sure find it impossible to understand a query until I've fixed up the formatting.
Slightly off topic but if you have control over these issues...
High level and High Impact.
For high IO environments make sure your disks are for either RAID 10 or RAID 0+1 or some nested implementation of raid 1 and raid 0.
Don't use drives less than 1500K.
Make sure your disks are only used for your Database. IE no logging no OS.
Turn off auto grow or similar feature. Let the database use all storage that is anticipated. Not necessarily what is currently being used.
design your schema and indexes for the type queries.
if it's a log type table (insert only) and must be in the DB don't index it.
if your doing allot of reporting (complex selects with many joins) then you should look at creating a data warehouse with a star or snowflake schema.
Don't be afraid of replicating data in exchange for performance!
CREATE INDEX
Assure there are indexes available for your WHERE and JOIN clauses. This will speed data access greatly.
If your environment is a data mart or warehouse, indexes should abound for almost any conceivable query.
In a transactional environment, the number of indexes should be lower and their definitions more strategic so that index maintenance doesn't drag down resources. (Index maintenance is when the leaves of an index must be changed to reflect a change in the underlying table, as with INSERT, UPDATE, and DELETE operations.)
Also, be mindful of the order of fields in the index - the more selective (higher cardinality) a field, the earlier in the index it should appear. For example, say you're querying for used automobiles:
SELECT i.make, i.model, i.price
FROM dbo.inventory i
WHERE i.color = 'red'
AND i.price BETWEEN 15000 AND 18000
Price generally has higher cardinality. There may be only a few dozen colors available, but quite possibly thousands of different asking prices.
Of these index choices, idx01 provides the faster path to satisfy the query:
CREATE INDEX idx01 ON dbo.inventory (price, color)
CREATE INDEX idx02 ON dbo.inventory (color, price)
This is because fewer cars will satisfy the price point than the color choice, giving the query engine far less data to analyze.
I've been known to have two very similar indexes differing only in the field order to speed queries (firstname, lastname) in one and (lastname, firstname) in the other.
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible and try to eliminate file sorts. High Performance MySQL: Optimization, Backups, Replication, and More is a great book on this topic as is MySQL Performance Blog.
A trick I recently learned is that SQL Server can update local variables as well as fields, in an update statement.
UPDATE table
SET #variable = column = #variable + otherColumn
Or the more readable version:
UPDATE table
SET
#variable = #variable + otherColumn,
column = #variable
I've used this to replace complicated cursors/joins when implementing recursive calculations, and also gained a lot in performance.
Here's details and example code that made fantastic improvements in performance:
Link
#Terrapin there are a few other differences between isnull and coalesce that are worth mentioning (besides ANSI compliance, which is a big one for me).
Coalesce vs. IsNull
Sometimes in SQL Server if you use an OR in a where clause it will really jack with performance. Instead of using the OR just do two selects and union them together. You get the same results at 1000x the speed.
Look at the where clause - verify use of indexes / verify nothing silly is being done
where SomeComplicatedFunctionOf(table.Column) = #param --silly
I'll generally start with the joins - I'll knock each one of them out of the query one at a time and re-run the query to get an idea if there's a particular join I'm having a problem with.
On all of my temp tables, I like to add unique constraints (where appropriate) to make indexes, and primary keys (almost always).
declare #temp table(
RowID int not null identity(1,1) primary key,
SomeUniqueColumn varchar(25) not null,
SomeNotUniqueColumn varchar(50) null,
unique(SomeUniqueColumn)
)
#DavidM
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible...
In SQL Server, execution plan gets you the same thing - it tells you what indexes are being hit, etc.
Not necessarily a SQL performance trick per se but definately related:
A good idea would be to use memcached where possible as it would be much faster just fetching the precompiled data directly from memory rather than getting it from the database. There's also a flavour of MySQL that got memcached built in (third party).
Make sure your index lengths are as small as possible. This allows the DB to read more keys at a time from the file system, thus speeding up your joins. I assume this works with all DB's, but I know it's a specific recommendation for MySQL.
I've made it a habit to always use bind variables. It's possible bind variables won't help if the RDBMS doesn't cache SQL statements. But if you don't use bind variables the RDBMS doesn't have a chance to reuse query execution plans and parsed SQL statements. The savings can be enormous: http://www.akadia.com/services/ora_bind_variables.html. I work mostly with Oracle, but Microsoft SQL Server works pretty much the same way.
In my experience, if you don't know whether or not you are using bind variables, you probably aren't. If your application language doesn't support them, find one that does. Sometimes you can fix query A by using bind variables for query B.
After that, I talk to our DBA to find out what's causing the RDBMS the most pain. Note that you shouldn't ask "Why is this query slow?" That's like asking your doctor to take out you appendix. Sure your query might be the problem, but it's just as likely that something else is going wrong. As developers, we we tend to think in terms of lines of code. If a line is slow, fix that line. But a RDBMS is a really complicated system and your slow query might be the symptom of a much larger problem.
Way too many SQL tuning tips are cargo cult idols. Most of the time the problem is unrelated or minimally related to the syntax you use, so it's normally best to use the cleanest syntax you can. Then you can start looking at ways to tune the database (not the query). Only tweak the syntax when that fails.
Like any performance tuning, always collect meaningful statistics. Don't use wallclock time unless it's the user experience you are tuning. Instead look at things like CPU time, rows fetched and blocks read off of disk. Too often people optimize for the wrong thing.
First step:
Look at the Query Execution Plan!
TableScan -> bad
NestedLoop -> meh warning
TableScan behind a NestedLoop -> DOOM!
SET STATISTICS IO ON
SET STATISTICS TIME ON
Running the query using WITH (NoLock) is pretty much standard operation in my place. Anyone caught running queries on the tens-of-gigabytes tables without it is taken out and shot.
Convert NOT IN queries to LEFT OUTER JOINS if possible. For example if you want to find all rows in Table1 that are unused by a foreign key in Table2 you could do this:
SELECT *
FROM Table1
WHERE Table1.ID NOT IN (
SELECT Table1ID
FROM Table2)
But you get much better performance with this:
SELECT Table1.*
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.ID = Table2.Table1ID
WHERE Table2.ID is null
Index the table(s) by the clm(s) you filter by
Prefix all tables with dbo. to prevent recompilations.
View query plans and hunt for table/index scans.
In 2005, scour the management views for missing indexes.
I like to use
isnull(SomeColThatMayBeNull, '')
Over
coalesce(SomeColThatMayBeNull, '')
When I don't need the multiple argument support that coalesce gives you.
http://blog.falafel.com/2006/04/05/SQLServerArcanaISNULLVsCOALESCE.aspx
I look out for:
Unroll any CURSOR loops and convert into set based UPDATE / INSERT statements.
Look out for any application code that:
Calls an SP that returns a large set of records,
Then in the application, goes through each record and calls an SP with parameters to update records.
Convert this into a SP that does all the work in one transaction.
Any SP that does lots of string manipulation. It's evidence that the data is not structured correctly / normalised.
Any SP's that re-invent the wheel.
Any SP's that I can't understand what it's trying to do within a minute!
SET NOCOUNT ON
Usually the first line inside my stored procedures, unless I actually need to use ##ROWCOUNT.
In SQL Server, use the nolock directive. It allows the select command to complete without having to wait - usually other transactions to finish.
SELECT * FROM Orders (nolock) where UserName = 'momma'
Remove cursors wherever the are not neceesary.
Remove function calls in Sprocs where a lot of rows will call the function.
My colleague used function calls (getting lastlogindate from userid as example) to return very wide recordsets.
Tasked with optimisation, I replaced the function calls in the sproc with the function's code: I got many sprocs' running time down from > 20 seconds to < 1.
Don't prefix Stored Procedure names with "sp_" because system procedures all start with "sp_", and SQL Server will have to search harder to find your procedure when it gets called.
Dirty reads -
set transaction isolation level read uncommitted
Prevents dead locks where transactional integrity isn't absolutely necessary (which is usually true)
I always go to SQL Profiler (if it's a stored procedure with a lot of nesting levels) or the query execution planner (if it's a few SQL statements with no nesting) first. 90% of the time you can find the problem immediately with one of these two tools.