How does including a SQL index hint affect query performance? - sql

Say I have a table in a SQL 2005 database with 2,000,000+ records and a few indexes. What advantage is there to using index hints in my queries? Are there ever disadvantages to using index hints in queries?

First, try using SQL Profiler to generate a .trc file of activity in your database for a normal workload over a few hours. And then use the "Database Engine Tuning Advisor" on the SQL Server Management Studio Tools menu to see if it suggests any additional indexes, composite indexes, or covering indexes that may be beneficial.
I never use query hints and mostly work with multi-million row databases. They sometimes can affect performance negatively.

The key point that I believe everyone here is pointing to is that with VERY careful consideration the usage of index hints can improve the performance of your queries, IF AND ONLY IF, multiple indexes exist that could be used to retreive the data, AND if SQL Server is not using the correct one.
In my experience I have found that it is NOT very common to need Index hints, I believe I maybe have 2-3 queries that are in use today that have used them.... Proper index creation and database optimization should get you most of the way there to the performing database.

The index hint will only come into play where your query involves joining tables, and where the columns being used to join to the other table matches more than one index. In that case the database engine may choose to use one index to make the join, and from investigation you may know that if it uses another index the query will perform better. In that case you provide the index hint telling the database engine which index to use.

My experience is that sometimes you know more about your dataset then SQL Server does. In that case you should use query hints. In other words: You help the optimizer decide.
I once build a datawarehouse where SQL Server did not use the optimal index on a complex query. By giving an index hint in my query I managed to make a query go about 100 times faster.
Use them only after you analysed the query plan. If you think your query can run faster when using another index or by using them in a different order, give the server a hint.

Related

DB2 optimize large IN predicate

I have a table with over 300 million records only containing key, source and hash value. The inbuilt sql of the application runs a large IN predicate on hash value in its sql to fetch the data.The sqls are performing slowly so need suggestions on how to improve the performance of the sql. I wouldn't be able to change the sql as its an internal sql already built-in the application. So far I have tried to put in an index on the key and another on hash column but doesn't provide much help.
Off the top of my head, you could create a table, perhaps a temporary table (which can also accept indices), containing the values in the IN predicate of your current query. Then, a simple inner join of your original query to this table would have the same effect, except it might be substantially faster if DB2 can take advantage of the index. You would need to make sure that the columns involved in the join both have an index.
You (or your DBA) might like to try the Design Advisor to advise on new indexes etc.
This can be run from the command line tool db2advis https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.perf.doc/doc/c0005144.html
Or from Data Server Manager which provides a Web Based front end for the
Query Advisor and Access Path Advisor. https://www-01.ibm.com/support/docview.wss?uid=swg27048195

Sql indexes vs full table scan

While writing complex SQL queries, how do we ensure that we are using proper indexes and avoiding full table scans? I do it by making sure I only join on columns that have indexes(primary key, unique key etc). Is this enough?
Ask the database for the execution plan for your query, and proceed from there.
Don't forget to index the columns that appear in your where clause as well.
Look at the execution plan of the query to see how the query optimizer thinks things must be retrieved. The plan is generally based on the statistics on the tables, the selectivity of the indices and the order of the joins. Note that the optimizer can decide that performing a full table scan is 'cheaper' than index lookup.
other Things to look for:
avoid subqueries if possible.
minimize the use of 'OR'-predicates
in the where clause
It is hard to say what is the best indexing because there are different strategies depend on situation. Still there are coupe things you should now about indexes.
Index SOMETIMES increase performance on select statement and ALWAYS decrease performance on insert and update.
To index table it is not necessary to make it as key on certain field. Also, real life indexes almost always include several fields.
Don't create any indexes for "future purposes" if your performance is satisfactory. Even if you don't have indexes at all.
Always try to analyze execution plan when tuning indexes. Don't be afraid to experiment.
+
Table scan is not always bad thing.
That is all from me.
Use Database Tuning Advisor(SQL Server) to analyse your query. It will suggest necessary indexes to add to tune your query performance

SQL Server Express performance issue

I know my questions will sound silly and probably nobody will have perfect answer but since I am in a complete dead-end with the situation it will make me feel better to post it here.
So...
I have a SQL Server Express database that's 500 Mb. It contains 5 tables and maybe 30 stored procedure. This database is use to store articles and is use for the Developer It web site. Normally the web pages load quickly, let's say 2 ou 3 sec. BUT, sqlserver process uses 100% of the processor for those 2 or 3 sec.
I try to find which stored procedure was the problem and I could not find one. It seems like every read into the table dans contains the articles (there are about 155,000 of them and 20 or so gets added every 15 minutes).
I added few indexes but without luck...
It is because the table is full text indexed ?
Should I have order with the primary key instead of date ? I never had any problems with ordering by dates....
Should I use dynamic SQL ?
Should I add the primary key into the URL of the articles ?
Should I use multiple indexes for separate columns or one big index ?
I you want more details or code bits, just ask for it.
Basically, every little hint is much appreciated.
Thanks.
If your index is not being used, then it usually indicates one of two problems:
Non-sargable predicate conditions, such as WHERE DATEPART(YY, Column) = <something>. Wrapping columns in a function will impair or eliminate the optimizer's ability to effectively use an index.
Non-covered columns in the output list, which is very likely if you're in the habit of writing SELECT * instead of SELECT specific_columns. If the index doesn't cover your query, then SQL Server needs to perform a RID/key lookup for every row, one by one, which can slow down the query so much that the optimizer just decides to do a table scan instead.
See if one of these might apply to your situation; if you're still confused, I'd recommend updating the question with more information about your schema, the data, and the queries that are slow. 500 MB is very small for a SQL database, so this shouldn't be slow. Also post what's in the execution plan.
Use SQL Profiler to capture a lot of typical queries used in your app. Then run the profiler results through index tuning wizard. That will tell you what indexes can be added to optimize.
Then look at the worst performing queries and analyze their execution plans manually.

Performance Tuning SQL - How?

How does one performance tune a SQL Query?
What tricks/tools/concepts can be used to change the performance of a SQL Query?
How can the benefits be Quantified?
What does one need to be careful of?
What tricks/tools/concepts can be used to change the performance of a SQL Query?
Using Indexes? How do they work in practice?
Normalised vs Denormalised Data? What are the performance vs design/maintenance trade offs?
Pre-processed intermediate tables? Created with triggers or batch jobs?
Restructure the query to use Temp Tables, Sub Queries, etc?
Separate complex queries into multiples and UNION the results?
Anything else?
How can performance be Quantified?
Reads?
CPU Time?
"% Query Cost" when different versions run together?
Anything else?
What does one need to be careful of?
Time to generate Execution Plans? (Stored Procs vs Inline Queries)
Stored Procs being forced to recompile
Testing on small data sets (Do the queries scale linearly, or square law, etc?)
Results of previous runs being cached
Optimising "normal case", but harming "worst case"
What is "Parameter Sniffing"?
Anything else?
Note to moderators:
This is a huge question, should I have split it up in to multiple questions?
Note To Responders:
Because this is a huge question please reference other questions/answers/articles rather than writing lengthy explanations.
I really like the book "Professional SQL Server 2005 Performance Tuning" to answer this. It's Wiley/Wrox, and no, I'm not an author, heh. But it explains a lot of the things you ask for here, plus hardware issues.
But yes, this question is way, way beyond the scope of something that can be answered in a comment box like this one.
Writing sargable queries is one of the things needed, if you don't write sargable queries then the optimizer can't take advantage of the indexes. Here is one example Only In A Database Can You Get 1000% + Improvement By Changing A Few Lines Of Code this query went from over 24 hours to 36 seconds
Of course you also need to know the difference between these 3 join
loop join,
hash join,
merge join
see here: http://msdn.microsoft.com/en-us/library/ms173815.aspx
Here some basic steps that need to follow:
Define business requirements first
SELECT fields instead of using SELECT *
Avoid SELECT DISTINCT
Create joins with INNER JOIN (not WHERE)
Use WHERE instead of HAVING to define filters
Proper indexing
Here are some basic steps which we can follow to increase the performance:
Check for indexes in pk and fk for the tables involved if it is still taking time index the columns present in the query.
All indexes are modified after every operation so kindly do not index each and every column
Before batch insertion delete the indexes and then recreate the indexes.
Select sparingly
Use if exists instead of count
Before accusing dba first check network connections

What generic techniques can be applied to optimize SQL queries?

What techniques can be applied effectively to improve the performance of SQL queries? Are there any general rules that apply?
Use primary keys
Avoid select *
Be as specific as you can when building your conditional statements
De-normalisation can often be more efficient
Table variables and temporary tables (where available) will often be better than using a large source table
Partitioned views
Employ indices and constraints
Learn what's really going on under the hood - you should be able to understand the following concepts in detail:
Indexes (not just what they are but actually how they work).
Clustered indexes vs heap allocated tables.
Text and binary lookups and when they can be in-lined.
Fill factor.
How records are ghosted for update/delete.
When page splits happen and why.
Statistics, and how they effect various query speeds.
The query planner, and how it works for your specific database (for instance on some systems "select *" is slow, on modern MS-Sql DBs the planner can handle it).
The biggest thing you can do is to look for table scans in sql server query analyzer (make sure you turn on "show execution plan"). Otherwise there are a myriad of articles at MSDN and elsewhere that will give good advice.
As an aside, when I started learning to optimize queries I ran sql server query profiler against a trace, looked at the generated SQL, and tried to figure out why that was an improvement. Query profiler is far from optimal, but it's a decent start.
There are a couple of things you can look at to optimize your query performance.
Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.
Consider de-normalising your database to reduce joins
Avoid loops (i.e. fetch cursors), stick to set operations.
Implement the query as a stored procedure as this is pre-compiled and will execute faster.
Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.
Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.
Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005
Make sure your indexes are not fragmented. Reducing SQL Server Index Fragmentation
Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005
Use a with statment to handle query filtering.
Limit each subquery to the minimum number of rows possible.
then join the subqueries.
WITH
master AS
(
SELECT SSN, FIRST_NAME, LAST_NAME
FROM MASTER_SSN
WHERE STATE = 'PA' AND
GENDER = 'M'
),
taxReturns AS
(
SELECT SSN, RETURN_ID, GROSS_PAY
FROM MASTER_RETURNS
WHERE YEAR < 2003 AND
YEAR > 2000
)
SELECT *
FROM master,
taxReturns
WHERE master.ssn = taxReturns.ssn
A subqueries within a with statement may end up as being the same as inline views,
or automatically generated temp tables. I find in the work I do, retail data, that about 70-80% of the time, there is a performance benefit.
100% of the time, there is a maintenance benefit.
I think using SQL query analyzer would be a good start.
In Oracle you can look at the explain plan to compare variations on your query
Make sure that you have the right indexes on the table. if you frequently use a column as a way to order or limit your dataset an index can make a big difference. I saw in a recent article that select distinct can really slow down a query, especially if you have no index.
The obvious optimization for SELECT queries is ensuring you have indexes on columns used for joins or in WHERE clauses.
Since adding indexes can slow down data writes you do need to monitor performance to ensure you don't kill the DB's write performance, but that's where using a good query analysis tool can help you balanace things accordingly.
Indexes
Statistics
on microsoft stack, Database Engine Tuning Advisor
Some other points (Mine are based on SQL server, since each db backend has it's own implementations they may or may not hold true for all databases):
Avoid correlated subqueries in the select part of a statement, they are essentially cursors.
Design your tables to use the correct datatypes to avoid having to apply functions on them to get the data out. It is far harder to do date math when you store your data as varchar for instance.
If you find that you are frequently doing joins that have functions in them, then you need to think about redesigning your tables.
If your WHERE or JOIN conditions include OR statements (which are slower) you may get better speed using a UNION statement.
UNION ALL is faster than UNION if (And only if) the two statments are mutually exclusive and return the same results either way.
NOT EXISTS is usually faster than NOT IN or using a left join with a WHERE clause of ID = null
In an UPDATE query add a WHERE condition to make sure you are not updating values that are already equal. The difference between updating 10,000,000 records and 4 can be quite significant!
Consider pre-calculating some values if you will be querying them frequently or for large reports. A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report. Pre-calculations should be done in triggers so that they are always up-to-date is the underlying data changes. And it doesn't have to be just numbers either, we havea calculated field that concatenates names that we use in reports.
Be wary of scalar UDFs, they can be slower than putting the code in line.
Temp table tend to be faster for large data set and table variables faster for small ones. In addition you can index temp tables.
Formatting is usually faster in the user interface than in SQL.
Do not return more data than you actually need.
This one seems obvious but you would not believe how often I end up fixing this. Do not join to tables that you are not using to filter the records or actually calling one of the fields in the select part of the statement. Unnecessary joins can be very expensive.
It is an very bad idea to create views that call other views that call other views. You may find you are joining to the same table 6 times when you only need to once and creating 100,000,00 records in an underlying view in order to get the 6 that are in your final result.
In designing a database, think about reporting not just the user interface to enter data. Data is useless if it is not used, so think about how it will be used after it is in the database and how that data will be maintained or audited. That will often change the design. (This is one reason why it is a poor idea to let an ORM design your tables, it is only thinking about one use case for the data.) The most complex queries affecting the most data are in reporting, so designing changes to help reporting can speed up queries (and simplify them) considerably.
Database-specific implementations of features can be faster than using standard SQL (That's one of the ways they sell their product), so get to know your database features and find out which are faster.
And because it can't be said too often, use indexes correctly, not too many or too few. And make your WHERE clauses sargable (Able to use indexes).