SELECT T.xhrs, T.eq_id, mt.CATEGORY, mt.MODEL_NAME,
ROUND((SUM (T.TOT_AVAIL_TIME-T.maintenance_TIME) / SUM(T.TOT_AVAIL_TIME))*100) AVAILABILITY,
ROUND((SUM(UTIL_TIME) / nullif(SUM(T.TOT_AVAIL_TIME-T.maintenance_TIME ),0) )*100) UTILIZATION,
ROUND(SUM(T.failedcmds)/
SUM(T.total_failedcmds),2)*100 failedcmds,
AVG(MAX_QL) MAX_QL,
AVG(AVG_QL) avg_ql
FROM db1 T,
db2 mt
WHERE T.eq_model = mt.eq_model
and TOT_AVAIL_TIME != 0
AND TOT_AVAIL_TIME IS NOT NULL
GROUP BY T.xhrs, T.eq_id, mt.CATEGORY, mt.MODEL_NAME
This query returns 2550 records and takes 12-15 s to run in SQL Developer and in mybatis it takes 30-35 secs. Is there any thing wrong with existing query? Is there any way to optimize above query and bring down the execution time to <5 secs in sql developer and <15 secs in ORM?
Explain Plan
Let me bring up a few points for us to think about performance.
The matter of fact is that "MyBatis or any other ORM is going to be slower than native SQL execution". Some of the reasons for this fact are:
R1. It is clear that MyBatis adds overhead to database calls. You gain flexibility, maintenance and encapsulation but you lose in performance.
R2. In most cases, MyBatis is going to parse the ResultSet to Java objects, this adds additional overhead. SQL Clients can work straight with cursors which are faster but in a long run it may be harder to maintain.
R3. In most cases, MyBatis is going to create a transaction for you, this adds additional overhead.
R4. In most cases, MyBatis is going to create and manage a cache for you, this adds additional overhead for the fist select but speed up the process for the next selects.
R5. MyBatis can also help you with lazy loading and other data retrieval strategies.
-R6. We should compare MyBatis executions against JDBC executions. Instead of comparing MyBatis against a SQL Client Tool (such as SQL Developer) because there are variables which can obscure your results. Example, they may not fetch all rows at once.
That being said, MyBatis may give you flexibility, better maintenance, an easy way to handle transact, easy way to parse tables to Java objects but it you take some performance from you.
So, if you want to speed up your queries there are a few things you should consider. See MyBatis Documentation:
C1. Use fetchSize in your queries.
C2. Use cache wisely, see what kind of caching it is needed, in some cases it make sense to do not use cache at all. Example: <select ... useCache="false">
C3. Be aware of the "N+1 Selects Problem". The documentation has some insights about this characteristic of many ORMs.
C4. Try to use a lightweight transactionManager such as "JDBC", keep in mind that aspect oriented transactions (such as in Spring #Transactional) will add a little bit more of overhead.
If the tables aren't indexed properly yet, you could get the most out of indexing T by TOT_AVAIL_TIME and T.eq_model and indexing mt by mt.eq_model
Put T.eq_model and T.TOT_AVAIL_TIME in one index (in this order).
Note: as you also check for NULL-ness of TOT_AVAIL_TIME, I assume it can be NULL by column definition. If you index it in a simple index, NULL-s are not indexed, and a TOT_AVAIL_TIME IS NOT NULL won't use the index later. So either combine it with eq_model, or remove that filter and change the column to NOT NULL.
I used this workaround on nullable columns on JIRA: (PROJECT_CAN_HAS_NULL ASC, '1').
Add index on eq_model column and check
Please try with this query only i combined the where condition on TOT_AVAIL_TIM
SELECT T.xhrs, T.eq_id, mt.CATEGORY, mt.MODEL_NAME,
ROUND((SUM (T.TOT_AVAIL_TIME-T.maintenance_TIME) / SUM(T.TOT_AVAIL_TIME))*100) AVAILABILITY,
ROUND((SUM(UTIL_TIME) / nullif(SUM(T.TOT_AVAIL_TIME-T.maintenance_TIME ),0) )*100) UTILIZATION,
ROUND(SUM(T.failedcmds)/
SUM(T.total_failedcmds),2)*100 failedcmds,
AVG(MAX_QL) MAX_QL,
AVG(AVG_QL) avg_ql
FROM db1 T,
db2 mt
WHERE T.eq_model = mt.eq_model
AND (TOT_AVAIL_TIME != 0 AND TOT_AVAIL_TIME IS NOT NULL)
GROUP BY T.xhrs, T.eq_id, mt.CATEGORY, mt.MODEL_NAME
Related
This is question related to question here, my question is rather from the perpective of Java application.
How to optimally query the database, if I have a webservice that returns data from a database depending on what client is allowed to get?
does it make sense to do a select with the maximal set of columns and filter values in Java code?
or is it better to construct many SQL PreparedStatements/NativeQueries, each would ask only for columns which will be really returned?
is a good compromise to construct basic queries determined only by joined tables and not by selected columns (these would be all)?
What would be good both for database and maintainable application? What I know at this time:
most JDBC drivers/SQL databases optimize the SQL query and caches optimized version for future requests. Every new query must be analysed and compared, optimized version must be found or created ... and hold in memory ... in worst case every new variant will produce new optimized version. Will be the optimized version same for these simple selects?
select name, surname from person
select name from person
writing many variants of selects is ugly - every variant must be tested, maintained, fast ...
No, it's not a premature optimization. Only fetch columns that are of interest; fetching columns that are not of interest increases network round trips, which impacts performance.
Fetching extra columns also prevents databases from possibly eliminating access to tables in cases where access to only an index might be required.
Depends, generally it's best, from a performance perspective, to filter in the database. If this gets too involved, it may be better from a readability perspective to do so at the client level.
I have a lot of records in table. When I execute the following query it takes a lot of time. How can I improve the performance?
SET ROWCOUNT 10
SELECT StxnID
,Sprovider.description as SProvider
,txnID
,Request
,Raw
,Status
,txnBal
,Stxn.CreatedBy
,Stxn.CreatedOn
,Stxn.ModifiedBy
,Stxn.ModifiedOn
,Stxn.isDeleted
FROM Stxn,Sprovider
WHERE Stxn.SproviderID = SProvider.Sproviderid
AND Stxn.SProviderid = ISNULL(#pSProviderID,Stxn.SProviderid)
AND Stxn.status = ISNULL(#pStatus,Stxn.status)
AND Stxn.CreatedOn BETWEEN ISNULL(#pStartDate,getdate()-1) and ISNULL(#pEndDate,getdate())
AND Stxn.CreatedBy = ISNULL(#pSellerId,Stxn.CreatedBy)
ORDER BY StxnID DESC
The stxn table has more than 100,000 records.
The query is run from a report viewer in asp.net c#.
This is my go-to article when I'm trying to do a search query that has several search conditions which might be optional.
http://www.sommarskog.se/dyn-search-2008.html
The biggest problem with your query is the column=ISNULL(#column, column) syntax. MSSQL won't use an index for that. Consider changing it to (column = #column AND #column IS NOT NULL)
You should consider using the execution plan and look for missing indexes. Also, how long it takes to execute? What is slow for you?
Maybe you could also not return so many rows, but that is just a guess. Actually we need to see your table and indexes plus the execution plan.
Check sql-tuning-tutorial
For one, use SELECT TOP () instead of SET ROWCOUNT - the optimizer will have a much better chance that way. Another suggestion is to use a proper inner join instead of potentially ending up with a cartesian product using the old style table,table join syntax (this is not the case here but it can happen much easier with the old syntax). Should be:
...
FROM Stxn INNER JOIN Sprovider
ON Stxn.SproviderID = SProvider.Sproviderid
...
And if you think 100K rows is a lot, or that this volume is a reason for slowness, you're sorely mistaken. Most likely you have really poor indexing strategies in place, possibly some parameter sniffing, possibly some implicit conversions... hard to tell without understanding the data types, indexes and seeing the plan.
There are a lot of things that could impact the performance of query. Although 100k records really isn't all that many.
Items to consider (in no particular order)
Hardware:
Is SQL Server memory constrained? In other words, does it have enough RAM to do its job? If it is swapping memory to disk, then this is a sure sign that you need an upgrade.
Is the machine disk constrained. In other words, are the drives fast enough to keep up with the queries you need to run? If it's memory constrained, then disk speed becomes a larger factor.
Is the machine processor constrained? For example, when you execute the query does the processor spike for long periods of time? Or, are there already lots of other queries running that are taking resources away from yours...
Database Structure:
Do you have indexes on the columns used in your where clause? If the tables do not have indexes then it will have to do a full scan of both tables to determine which records match.
Eliminate the ISNULL function calls. If this is a direct query, have the calling code validate the parameters and set default values before executing. If it is in a stored procedure, do the checks at the top of the s'proc. Unless you are executing this with RECOMPILE that does parameter sniffing, those functions will have to be evaluated for each row..
Network:
Is the network slow between you and the server? Depending on the amount of data pulled you could be pulling GB's of data across the wire. I'm not sure what is stored in the "raw" column. The first question you need to ask here is "how much data is going back to the client?" For example, if each record is 1MB+ in size, then you'll probably have disk and network constraints at play.
General:
I'm not sure what "slow" means in your question. Does it mean that the query is taking around 1 second to process or does it mean it's taking 5 minutes? Everything is relative here.
Basically, it is going to be impossible to give a hard answer without a lot of questions asked by you. All of these will bear out if you profile the queries, understand what and how much is going back to the client and watch the interactions amongst the various parts.
Finally depending on the amount of data going back to the client there might not be a way to improve performance short of hardware changes.
Make sure Stxn.SproviderID, Stxn.status, Stxn.CreatedOn, Stxn.CreatedBy, Stxn.StxnID and SProvider.Sproviderid all have indexes defined.
(NB -- you might not need all, but it can't hurt.)
I don't see much that can be done on the query itself, but I can see things being done on the schema :
Create an index / PK on Stxn.SproviderID
Create an index / PK on SProvider.Sproviderid
Create indexes on status, CreatedOn, CreatedBy, StxnID
Something to consider: When ROWCOUNT or TOP are used with an ORDER BY clause, the entire result set is created and sorted first and then the top 10 results are returned.
How does this run without the Order By clause?
I have a query that looks something like this:
select xmlelement("rootNode",
(case
when XH.ID is not null then
xmlelement("xhID", XH.ID)
else
xmlelement("xhID", xmlattributes('true' AS "xsi:nil"), XH.ID)
end),
(case
when XH.SER_NUM is not null then
xmlelement("serialNumber", XH.SER_NUM)
else
xmlelement("serialNumber", xmlattributes('true' AS "xsi:nil"), XH.SER_NUM)
end),
/*repeat this pattern for many more columns from the same table...*/
FROM XH
WHERE XH.ID = 'SOMETHINGOROTHER'
It's ugly and I don't like it, and it is also the slowest executing query (there are others of similar form, but much smaller and they aren't causing any major problems - yet). Maintenance is relatively easy as this is mostly a generated query, but my concern now is for performance. I am wondering how much of an overhead there is for all of these case expressions.
To see if there was any difference, I wrote another version of this query as:
select xmlelement("rootNode",
xmlforest(XH.ID, XH.SER_NUM,...
(I know that this query does not produce exactly the same, thing, my plan was to move the logic for handling the renaming and xsi:nil attribute to XSL or maybe to PL/SQL)
I tried to get execution plans for both versions, but they are the same. I'm guessing that the logic does not get factored into the execution plan. My gut tells me the second version should execute faster, but I'd like some way to prove that (other than writing a PL/SQL test function with timing statements before and after the query and running that code over and over again to get a test sample).
Is it possible to get a good idea of how much the case-when will cost?
Also, I could write the case-when using the decode function instead. Would that perform better (than case-statements)?
Just about anything in your SELECT list, unless it is a user-defined function which reads a table or view, or a nested subselect, can usually be neglected for the purpose of analyzing your query's performance.
Open your connection properties and set the value SET STATISTICS IO on. Check out how many reads are happening. View the query plan. Are your indexes being used properly? Do you know how to analyze the plan to see?
For the purposes of performance tuning you are dealing with this statement:
SELECT *
FROM XH
WHERE XH.ID = 'SOMETHINGOROTHER'
How does that query perform? If it returns in markedly less time than the XML version then you need to consider the performance of the functions, but I would astonished if that were the case (oh ho!).
Does this return one row or several? If one row then you have only two things to work with:
is XH.ID indexed and, if so, is the index being used?
does the "many more columns from the same table" indicate a problem with chained rows?
If the query returns several rows then ... Well, actually you have the same two things to work with. It's just the emphasis is different with regards to indexes. If the index has a very poor clustering factor then it could be faster to avoid using the index in favour of a full table scan.
Beyond that you would need to look at physical problems - I/O bottlenecks, poor interconnects, a dodgy disk. The reason why your scope for tuning the query is so restricted is because - as presented - it is a single table, single column read. Most tuning is about efficient joining. Now if XH transpires to be a view over a complex query then it is a different matter.
You can use good old tkprof to analyze statistics. One of the many forms of ALTER SESSION that turn on stats gathering. The DBMS_PROFILER package also gathers statistics if your cursor is in a PL/SQL code block.
What techniques can be applied effectively to improve the performance of SQL queries? Are there any general rules that apply?
Use primary keys
Avoid select *
Be as specific as you can when building your conditional statements
De-normalisation can often be more efficient
Table variables and temporary tables (where available) will often be better than using a large source table
Partitioned views
Employ indices and constraints
Learn what's really going on under the hood - you should be able to understand the following concepts in detail:
Indexes (not just what they are but actually how they work).
Clustered indexes vs heap allocated tables.
Text and binary lookups and when they can be in-lined.
Fill factor.
How records are ghosted for update/delete.
When page splits happen and why.
Statistics, and how they effect various query speeds.
The query planner, and how it works for your specific database (for instance on some systems "select *" is slow, on modern MS-Sql DBs the planner can handle it).
The biggest thing you can do is to look for table scans in sql server query analyzer (make sure you turn on "show execution plan"). Otherwise there are a myriad of articles at MSDN and elsewhere that will give good advice.
As an aside, when I started learning to optimize queries I ran sql server query profiler against a trace, looked at the generated SQL, and tried to figure out why that was an improvement. Query profiler is far from optimal, but it's a decent start.
There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented. Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005
Use a with statment to handle query filtering.
Limit each subquery to the minimum number of rows possible.
then join the subqueries.
WITH
master AS
(
SELECT SSN, FIRST_NAME, LAST_NAME
FROM MASTER_SSN
WHERE STATE = 'PA' AND
GENDER = 'M'
),
taxReturns AS
(
SELECT SSN, RETURN_ID, GROSS_PAY
FROM MASTER_RETURNS
WHERE YEAR < 2003 AND
YEAR > 2000
)
SELECT *
FROM master,
taxReturns
WHERE master.ssn = taxReturns.ssn
A subqueries within a with statement may end up as being the same as inline views,
or automatically generated temp tables. I find in the work I do, retail data, that about 70-80% of the time, there is a performance benefit.
100% of the time, there is a maintenance benefit.
I think using SQL query analyzer would be a good start.
In Oracle you can look at the explain plan to compare variations on your query
Make sure that you have the right indexes on the table. if you frequently use a column as a way to order or limit your dataset an index can make a big difference. I saw in a recent article that select distinct can really slow down a query, especially if you have no index.
The obvious optimization for SELECT queries is ensuring you have indexes on columns used for joins or in WHERE clauses.
Since adding indexes can slow down data writes you do need to monitor performance to ensure you don't kill the DB's write performance, but that's where using a good query analysis tool can help you balanace things accordingly.
Indexes
Statistics
on microsoft stack, Database Engine Tuning Advisor
Some other points (Mine are based on SQL server, since each db backend has it's own implementations they may or may not hold true for all databases):
Avoid correlated subqueries in the select part of a statement, they are essentially cursors.
Design your tables to use the correct datatypes to avoid having to apply functions on them to get the data out. It is far harder to do date math when you store your data as varchar for instance.
If you find that you are frequently doing joins that have functions in them, then you need to think about redesigning your tables.
If your WHERE or JOIN conditions include OR statements (which are slower) you may get better speed using a UNION statement.
UNION ALL is faster than UNION if (And only if) the two statments are mutually exclusive and return the same results either way.
NOT EXISTS is usually faster than NOT IN or using a left join with a WHERE clause of ID = null
In an UPDATE query add a WHERE condition to make sure you are not updating values that are already equal. The difference between updating 10,000,000 records and 4 can be quite significant!
Consider pre-calculating some values if you will be querying them frequently or for large reports. A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report. Pre-calculations should be done in triggers so that they are always up-to-date is the underlying data changes. And it doesn't have to be just numbers either, we havea calculated field that concatenates names that we use in reports.
Be wary of scalar UDFs, they can be slower than putting the code in line.
Temp table tend to be faster for large data set and table variables faster for small ones. In addition you can index temp tables.
Formatting is usually faster in the user interface than in SQL.
Do not return more data than you actually need.
This one seems obvious but you would not believe how often I end up fixing this. Do not join to tables that you are not using to filter the records or actually calling one of the fields in the select part of the statement. Unnecessary joins can be very expensive.
It is an very bad idea to create views that call other views that call other views. You may find you are joining to the same table 6 times when you only need to once and creating 100,000,00 records in an underlying view in order to get the 6 that are in your final result.
In designing a database, think about reporting not just the user interface to enter data. Data is useless if it is not used, so think about how it will be used after it is in the database and how that data will be maintained or audited. That will often change the design. (This is one reason why it is a poor idea to let an ORM design your tables, it is only thinking about one use case for the data.) The most complex queries affecting the most data are in reporting, so designing changes to help reporting can speed up queries (and simplify them) considerably.
Database-specific implementations of features can be faster than using standard SQL (That's one of the ways they sell their product), so get to know your database features and find out which are faster.
And because it can't be said too often, use indexes correctly, not too many or too few. And make your WHERE clauses sargable (Able to use indexes).
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
When you have a query or stored procedure that needs performance tuning, what are some of the first things you try?
Here is the handy-dandy list of things I always give to someone asking me about optimisation.
We mainly use Sybase, but most of the advice will apply across the board.
SQL Server, for example, comes with a host of performance monitoring / tuning bits, but if you don't have anything like that (and maybe even if you do) then I would consider the following...
99% of problems I have seen are caused by putting too many tables in a join. The fix for this is to do half the join (with some of the tables) and cache the results in a temporary table. Then do the rest of the query joining on that temporary table.
Query Optimisation Checklist
Run UPDATE STATISTICS on the underlying tables
Many systems run this as a scheduled weekly job
Delete records from underlying tables (possibly archive the deleted records)
Consider doing this automatically once a day or once a week.
Rebuild Indexes
Rebuild Tables (bcp data out/in)
Dump / Reload the database (drastic, but might fix corruption)
Build new, more appropriate index
Run DBCC to see if there is possible corruption in the database
Locks / Deadlocks
Ensure no other processes running in database
Especially DBCC
Are you using row or page level locking?
Lock the tables exclusively before starting the query
Check that all processes are accessing tables in the same order
Are indices being used appropriately?
Joins will only use index if both expressions are exactly the same data type
Index will only be used if the first field(s) on the index are matched in the query
Are clustered indices used where appropriate?
range data
WHERE field between value1 and value2
Small Joins are Nice Joins
By default the optimiser will only consider the tables 4 at a time.
This means that in joins with more than 4 tables, it has a good chance of choosing a non-optimal query plan
Break up the Join
Can you break up the join?
Pre-select foreign keys into a temporary table
Do half the join and put results in a temporary table
Are you using the right kind of temporary table?
#temp tables may perform much better than #table variables with large volumes (thousands of rows).
Maintain Summary Tables
Build with triggers on the underlying tables
Build daily / hourly / etc.
Build ad-hoc
Build incrementally or teardown / rebuild
See what the query plan is with SET SHOWPLAN ON
See what’s actually happenning with SET STATS IO ON
Force an index using the pragma: (index: myindex)
Force the table order using SET FORCEPLAN ON
Parameter Sniffing:
Break Stored Procedure into 2
call proc2 from proc1
allows optimiser to choose index in proc2 if #parameter has been changed by proc1
Can you improve your hardware?
What time are you running? Is there a quieter time?
Is Replication Server (or other non-stop process) running? Can you suspend it? Run it eg. hourly?
Have a pretty good idea of the optimal path of running the query in your head.
Check the query plan - always.
Turn on STATS, so that you can examine both IO and CPU performance. Focus on driving those numbers down, not necessarily the query time (as that can be influenced by other activity, cache, etc.).
Look for large numbers of rows coming into an operator, but small numbers coming out. Usually, an index would help by limiting the number of rows coming in (which saves disk reads).
Focus on the largest cost subtree first. Changing that subtree can often change the entire query plan.
Common problems I've seen are:
If there's a lot of joins, sometimes Sql Server will choose to expand the joins, and then apply WHERE clauses. You can usually fix this by moving the WHERE conditions into the JOIN clause, or a derived table with the conditions inlined. Views can cause the same problems.
Suboptimal joins (LOOP vs HASH vs MERGE). My rule of thumb is to use a LOOP join when the top row has very few rows compared to the bottom, a MERGE when the sets are roughly equal and ordered, and a HASH for everything else. Adding a join hint will let you test your theory.
Parameter sniffing. If you ran the stored proc with unrealistic values at first (say, for testing), then the cached query plan may be suboptimal for your production values. Running again WITH RECOMPILE should verify this. For some stored procs, especially those that deal with varying sized ranges (say, all dates between today and yesterday - which would entail an INDEX SEEK - or, all dates between last year and this year - which would be better off with an INDEX SCAN) you may have to run it WITH RECOMPILE every time.
Bad indentation...Okay, so Sql Server doesn't have an issue with this - but I sure find it impossible to understand a query until I've fixed up the formatting.
Slightly off topic but if you have control over these issues...
High level and High Impact.
For high IO environments make sure your disks are for either RAID 10 or RAID 0+1 or some nested implementation of raid 1 and raid 0.
Don't use drives less than 1500K.
Make sure your disks are only used for your Database. IE no logging no OS.
Turn off auto grow or similar feature. Let the database use all storage that is anticipated. Not necessarily what is currently being used.
design your schema and indexes for the type queries.
if it's a log type table (insert only) and must be in the DB don't index it.
if your doing allot of reporting (complex selects with many joins) then you should look at creating a data warehouse with a star or snowflake schema.
Don't be afraid of replicating data in exchange for performance!
CREATE INDEX
Assure there are indexes available for your WHERE and JOIN clauses. This will speed data access greatly.
If your environment is a data mart or warehouse, indexes should abound for almost any conceivable query.
In a transactional environment, the number of indexes should be lower and their definitions more strategic so that index maintenance doesn't drag down resources. (Index maintenance is when the leaves of an index must be changed to reflect a change in the underlying table, as with INSERT, UPDATE, and DELETE operations.)
Also, be mindful of the order of fields in the index - the more selective (higher cardinality) a field, the earlier in the index it should appear. For example, say you're querying for used automobiles:
SELECT i.make, i.model, i.price
FROM dbo.inventory i
WHERE i.color = 'red'
AND i.price BETWEEN 15000 AND 18000
Price generally has higher cardinality. There may be only a few dozen colors available, but quite possibly thousands of different asking prices.
Of these index choices, idx01 provides the faster path to satisfy the query:
CREATE INDEX idx01 ON dbo.inventory (price, color)
CREATE INDEX idx02 ON dbo.inventory (color, price)
This is because fewer cars will satisfy the price point than the color choice, giving the query engine far less data to analyze.
I've been known to have two very similar indexes differing only in the field order to speed queries (firstname, lastname) in one and (lastname, firstname) in the other.
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible and try to eliminate file sorts. High Performance MySQL: Optimization, Backups, Replication, and More is a great book on this topic as is MySQL Performance Blog.
A trick I recently learned is that SQL Server can update local variables as well as fields, in an update statement.
UPDATE table
SET #variable = column = #variable + otherColumn
Or the more readable version:
UPDATE table
SET
#variable = #variable + otherColumn,
column = #variable
I've used this to replace complicated cursors/joins when implementing recursive calculations, and also gained a lot in performance.
Here's details and example code that made fantastic improvements in performance:
Link
#Terrapin there are a few other differences between isnull and coalesce that are worth mentioning (besides ANSI compliance, which is a big one for me).
Coalesce vs. IsNull
Sometimes in SQL Server if you use an OR in a where clause it will really jack with performance. Instead of using the OR just do two selects and union them together. You get the same results at 1000x the speed.
Look at the where clause - verify use of indexes / verify nothing silly is being done
where SomeComplicatedFunctionOf(table.Column) = #param --silly
I'll generally start with the joins - I'll knock each one of them out of the query one at a time and re-run the query to get an idea if there's a particular join I'm having a problem with.
On all of my temp tables, I like to add unique constraints (where appropriate) to make indexes, and primary keys (almost always).
declare #temp table(
RowID int not null identity(1,1) primary key,
SomeUniqueColumn varchar(25) not null,
SomeNotUniqueColumn varchar(50) null,
unique(SomeUniqueColumn)
)
#DavidM
Assuming MySQL here, use EXPLAIN to find out what is going on with the query, make sure that the indexes are being used as efficiently as possible...
In SQL Server, execution plan gets you the same thing - it tells you what indexes are being hit, etc.
Not necessarily a SQL performance trick per se but definately related:
A good idea would be to use memcached where possible as it would be much faster just fetching the precompiled data directly from memory rather than getting it from the database. There's also a flavour of MySQL that got memcached built in (third party).
Make sure your index lengths are as small as possible. This allows the DB to read more keys at a time from the file system, thus speeding up your joins. I assume this works with all DB's, but I know it's a specific recommendation for MySQL.
I've made it a habit to always use bind variables. It's possible bind variables won't help if the RDBMS doesn't cache SQL statements. But if you don't use bind variables the RDBMS doesn't have a chance to reuse query execution plans and parsed SQL statements. The savings can be enormous: http://www.akadia.com/services/ora_bind_variables.html. I work mostly with Oracle, but Microsoft SQL Server works pretty much the same way.
In my experience, if you don't know whether or not you are using bind variables, you probably aren't. If your application language doesn't support them, find one that does. Sometimes you can fix query A by using bind variables for query B.
After that, I talk to our DBA to find out what's causing the RDBMS the most pain. Note that you shouldn't ask "Why is this query slow?" That's like asking your doctor to take out you appendix. Sure your query might be the problem, but it's just as likely that something else is going wrong. As developers, we we tend to think in terms of lines of code. If a line is slow, fix that line. But a RDBMS is a really complicated system and your slow query might be the symptom of a much larger problem.
Way too many SQL tuning tips are cargo cult idols. Most of the time the problem is unrelated or minimally related to the syntax you use, so it's normally best to use the cleanest syntax you can. Then you can start looking at ways to tune the database (not the query). Only tweak the syntax when that fails.
Like any performance tuning, always collect meaningful statistics. Don't use wallclock time unless it's the user experience you are tuning. Instead look at things like CPU time, rows fetched and blocks read off of disk. Too often people optimize for the wrong thing.
First step:
Look at the Query Execution Plan!
TableScan -> bad
NestedLoop -> meh warning
TableScan behind a NestedLoop -> DOOM!
SET STATISTICS IO ON
SET STATISTICS TIME ON
Running the query using WITH (NoLock) is pretty much standard operation in my place. Anyone caught running queries on the tens-of-gigabytes tables without it is taken out and shot.
Convert NOT IN queries to LEFT OUTER JOINS if possible. For example if you want to find all rows in Table1 that are unused by a foreign key in Table2 you could do this:
SELECT *
FROM Table1
WHERE Table1.ID NOT IN (
SELECT Table1ID
FROM Table2)
But you get much better performance with this:
SELECT Table1.*
FROM Table1
LEFT OUTER JOIN Table2 ON Table1.ID = Table2.Table1ID
WHERE Table2.ID is null
Index the table(s) by the clm(s) you filter by
Prefix all tables with dbo. to prevent recompilations.
View query plans and hunt for table/index scans.
In 2005, scour the management views for missing indexes.
I like to use
isnull(SomeColThatMayBeNull, '')
Over
coalesce(SomeColThatMayBeNull, '')
When I don't need the multiple argument support that coalesce gives you.
http://blog.falafel.com/2006/04/05/SQLServerArcanaISNULLVsCOALESCE.aspx
I look out for:
Unroll any CURSOR loops and convert into set based UPDATE / INSERT statements.
Look out for any application code that:
Calls an SP that returns a large set of records,
Then in the application, goes through each record and calls an SP with parameters to update records.
Convert this into a SP that does all the work in one transaction.
Any SP that does lots of string manipulation. It's evidence that the data is not structured correctly / normalised.
Any SP's that re-invent the wheel.
Any SP's that I can't understand what it's trying to do within a minute!
SET NOCOUNT ON
Usually the first line inside my stored procedures, unless I actually need to use ##ROWCOUNT.
In SQL Server, use the nolock directive. It allows the select command to complete without having to wait - usually other transactions to finish.
SELECT * FROM Orders (nolock) where UserName = 'momma'
Remove cursors wherever the are not neceesary.
Remove function calls in Sprocs where a lot of rows will call the function.
My colleague used function calls (getting lastlogindate from userid as example) to return very wide recordsets.
Tasked with optimisation, I replaced the function calls in the sproc with the function's code: I got many sprocs' running time down from > 20 seconds to < 1.
Don't prefix Stored Procedure names with "sp_" because system procedures all start with "sp_", and SQL Server will have to search harder to find your procedure when it gets called.
Dirty reads -
set transaction isolation level read uncommitted
Prevents dead locks where transactional integrity isn't absolutely necessary (which is usually true)
I always go to SQL Profiler (if it's a stored procedure with a lot of nesting levels) or the query execution planner (if it's a few SQL statements with no nesting) first. 90% of the time you can find the problem immediately with one of these two tools.