How to know when to use indexes and which type? - sql

I've searched a bit and didn't see any similar question, so here goes.
How do you know when to put an index in a table? How do you decide which columns to include in the index? When should a clustered index be used?
Can an index ever slow down the performance of select statements? How many indexes is too many and how big of a table do you need for it to benefit from an index?
EDIT:
What about column data types? Is it ok to have an index on a varchar or datetime?

Well, the first question is easy:
When should a clustered index be used?
Always. Period. Except for a very few, rare, edge cases. A clustered index makes a table faster, for every operation. YES! It does. See Kim Tripp's excellent The Clustered Index Debate continues for background info. She also mentions her main criteria for a clustered index:
narrow
static (never changes)
unique
if ever possible: ever increasing
INT IDENTITY fulfills this perfectly - GUID's do not. See GUID's as Primary Key for extensive background info.
Why narrow? Because the clustering key is added to each and every index page of each and every non-clustered index on the same table (in order to be able to actually look up the data row, if needed). You don't want to have VARCHAR(200) in your clustering key....
Why unique?? See above - the clustering key is the item and mechanism that SQL Server uses to uniquely find a data row. It has to be unique. If you pick a non-unique clustering key, SQL Server itself will add a 4-byte uniqueifier to your keys. Be careful of that!
Next: non-clustered indices. Basically there's one rule: any foreign key in a child table referencing another table should be indexed, it'll speed up JOINs and other operations.
Furthermore, any queries that have WHERE clauses are a good candidate - pick those first which are executed a lot. Put indices on columns that show up in WHERE clauses, in ORDER BY statements.
Next: measure your system, check the DMV's (dynamic management views) for hints about unused or missing indices, and tweak your system over and over again. It's an ongoing process, you'll never be done! See here for info on those two DMV's (missing and unused indices).
Another word of warning: with a truckload of indices, you can make any SELECT query go really really fast. But at the same time, INSERTs, UPDATEs and DELETEs which have to update all the indices involved might suffer. If you only ever SELECT - go nuts! Otherwise, it's a fine and delicate balancing act. You can always tweak a single query beyond belief - but the rest of your system might suffer in doing so. Don't over-index your database! Put a few good indices in place, check and observe how the system behaves, and then maybe add another one or two, and again: observe how the total system performance is affected by that.

Rule of thumb is primary key (implied and defaults to clustered) and each foreign key column
There is more but you could do worse than using SQL Server's missing index DMVs
An index may slow down a SELECT if the optimiser makes a bad choice, and it is possible to have too many. Too many will slow writes but it's also possible to overlap indexes

Answering the ones I can I would say that every table, no matter how small, will always benefit from at least one index as there has to be at least one way in which you are interested in looking up the data; otherwise why store it?
A general rule for adding indexes would be if you need to find data in the table using a particular field, or set of fields. This leads on to how many indexes are too many, generally the more indexes you have the slower inserts and updates will be as they also have to modify the indexes but it all depends on how you use your data. If you need fast inserts then don't use too many. In reporting "read only" type data stores you can have a number of them to make all your lookups faster.
Unfortunately there is no one rule to guide you on the number or type of indexes to use, although the query optimiser of your chosen DB can give hints based on the queries you are executing.
As to clustered indexes they are the Ace card you only get to use once, so choose carefully. It's worth calculating the selectivity of the field you are thinking of putting it on as it can be wasted to put it on something like a boolean field (contrived example) as the selectivity of the data is very low.

This is really a very involved question, though a good starting place would be to index any column that you will filter results on. ie. If you often break products into groups by sale price, index the sale_price column of the products table to improve scan times for that query, etc.

If you are querying based on the value in a column, you probably want to index that column.
i.e.
SELECT a,b,c FROM MyTable WHERE x = 1
You would want an index on X.
Generally, I add indexes for columns which are frequently queried, and I add compound indexes when I'm querying on more than one column.
Indexes won't hurt the performance of a SELECT, but they may slow down INSERTS (or UPDATES) if you have too many indexes columns per table.
As a rule of thumb - start off by adding indexes when you find yourself saying WHERE a = 123 (in this case, an index for "a").

You should use an index on columns that you use for selection and ordering - i.e. the WHERE and ORDER BY clauses.
Indexes can slow down select statements if there are many of them and you are using WHERE and ORDER BY on columns that have not been indexed.
As for size of table - several thousands rows and upwards would start showing real benefits to index usage.
Having said that, there are automated tools to do this, and SQL server has an Database Tuning Advisor that will help with this.

Related

When isn't it appropriate to use a SQL Index

I was asked a question today on when wouldn't I want to create a SQL Index on a table.
The only thing I can think of is when you don't need one (i.e. a small table). That answer doesn't feel right. Is there a thresh-hold on when I should use an index and when I shouldn't?
When not to create an index on the table, there are lots of things to consider.
First, is that there are a lot of possible indexes you could create. For example, you could create an index containing not only every column in the table, but every permutation of the columns (since column ordering in indexes does matter). This could be a huge number of indexes as your column count gets higher.
Every index comes with a number of things that decrease performance in different ways. For example, they may take memory/disk space from what is available. Probably worse than this though, is the fact that indexes need to be updated when the table underneath it is updated. This means that every insert/update/delete in a table, can trigger an index update. As you have more indexes, that's more indexes to update, which can kill performance on your CUD operations, and can kill your server performance if you are doing these often.
Because of this performance impact, you want to avoid 'useless' indexes. Indexes that are used for every query are typically good, but an index used only once a day for a <1s query is probably useless. It's all a tradeoff in attempting to determine which indexes are useful enough to use and whose performance benefits are greater than the performance hits.
You could answer it with the conter question: When do you need an index?
You need an index, if you want to search for entries, to get your results faster. For example if the column is used in a where clause. Of course you could try index everything, but indexing will cause you to use extra memory/hard disk. So you only index columns you use to find your rows.
What rows MySQL for example is reading while trying to find your rows, you can analyze with the EXPLAIN command.
Does this help?
A rule of thumb is, to drop all indices except the unique index on the primary key, on small tables (less than about 100'000 rows).
Also, it is not appropriate to use an index, if the column is not for search purpose (e.g. the salary of employees).

Do I need to use this many indexes in my SQL Server 2008 database?

I'd appreciate some advice from SQL Server gurus here. Let me explain...
I have an SQL Server 2008 database table that has 21 columns. Here's a quick type of those:
INT Primary Key
Several other INT's that are indexes already (used to reference this and other tables)
Several NVARCHAR(64) to hold user-provided text
Several NVARCHAR(256) to hold longer user-provided text
Several DATETIME2
One BIGINT
Several UNIQUEIDENTIFIER, one is already an index
The way this table is used is that it is presented to a user as a sortable table and a user can choose which column to sort it by. This table may contain many thousands of records (like currently it does 21,000 and it will be growing.)
So my question is, do I need to set each column as an INDEX to enable faster sorting?
PS. Forgot to say. The output obviously supports pagination, so the user sees no more than 100 rows at once.
Contrary to popular belief, just having an index on a column does not guarantee that any queries will be any faster!
If you constantly use SELECT *.. from that table, these non-clustered indices on a single column will most likely not be used at all.
A good nonclustered index is a covering index, which means, it contains all the necessary columns to satisfy one or multiple given queries. If you have this situation, then a nonclustered index can make sense - otherwise, in more cases than not, the nonclustered index is likely to be ignored by the query optimizer. The reason for this being: if you need all the columns anyway, the query would have to do key lookups from the nonclustered index into the actual data (the clustered index) for each row found - and the key lookup is a very expensive operation, so doing this for a lots of hits becomes overly costly, and the query optimizer will rather quickly switch to a index scan (possibly the clustered index scan) to fetch the data.
Don't over-index - use a well-designed clustered index, put indices on the foreign key columns to speed up joins - and then let it be for the time being. Observe your system, measure performance, maybe add an index here or there - but don't just overload the system with tons of indices!
Having too many indices can be worse than having none - every index must be maintained, e.g. updated for each INSERT, UPDATE and DELETE statement - does that take time!
this table is ... presented to a user as a sortable table ... [that] may contain many thousands of records
If you're ordering many thousands of records for display, you're doing it wrong. Typical users can reasonably process at most around 500 typical records. Exceptional users can handle a couple thousand. Any more than that, and you're misleading your users into a false sense that they've seen a representative sample. This results in poor decision making and inefficient user workflow. Instead, you need to focus on a good search algorithm.
Another to keep in mind here is that more indexes means slower inserts and updates. It's a balancing act. Sql Server keeps statistics on what queries and sorts it actually performs, and makes those statistics available to you. There are queries you can run that tell you exactly what indexes Sql Server thinks it could use. I would deploy without any sorting index and let it run for a week or two that way. Then look at data and see what users actually sort on and index just those columns.
Take a look at this link for an example and introduction on finding missing indexes:
http://sqlserverpedia.com/wiki/Find_Missing_Indexes
Generally indexes use to accelerate WHERE conditions (in some cases JOINS). so I don't thinks create index on column except PRIMARY KEY accelerate sorting. you can do your sorting in clients(if you use win forms or wpf) or in database for web scenarios
Good Luck

Database indexes: A good thing, a bad thing, or a waste of time?

Adding indexes is often suggested here as a remedy for performance problems.
(I'm talking about reading & querying ONLY, we all know indexes can make writing slower).
I have tried this remedy many times, over many years, both on DB2 and MSSQL, and the result were invariably disappointing.
My finding has been that no matter how 'obvious' it was that an index would make things better, it turned out that the query optimiser was smarter, and my cleverly-chosen index almost always made things worse.
I should point out that my experiences relate mostly to small tables (<100'000 rows).
Can anyone provide some down-to-earth guidelines on choices for indexing?
The correct answer would be a list of recommendations something like:
Never/always index a table with less than/more than NNNN records
Never/always consider indexes on multi-field keys
Never/always use clustered indexes
Never/always use more than NNN indexes on a single table
Never/always add an index when [some magic condition I'm dying to learn about]
Ideally, the answer will give some instructive examples.
Indexes are kind of like chemotherapy...too much and it kills you...too little and you die...do it the wrong way and you die. You gotta know just how much, how often, and what kind to make it not kill you.
Your hardware, platform, environment, load all play a role. So to answer your questions..
Yes, possibly sometimes.
As a rule of thumb, primary keys and foreign keys need to be indexed. Usually primary key are indexed just by defining them as such, but FKs are not in every database (they definitely are not in SQL Server, I can't really speak for other dbs). You will be using these in joins, so it is generally critical to performance to define these.
Now if you have fields you often use in where clauses, they can benefit from indexes as well providing several things:
First the field must have a range of
values. A bit field or a field with
only 2 or 3 values will almost never
use an index.
Second the queries you write must be sargable. That is they must be designed to use indexes. I suspect if you never get performance improvements from what look like likely candidates for indexes, then you probably have queries that are not sargable. For instance take "WHERE Name like '%Smith'" as a where clause. Without knowing the first characters, the optimizer can't use the index.
Small tables rarely benefit much from indexes. If the optimizer can hold the whole thing in memory, then it is often faster to do so. If you were working with multimillion record tables, you would see that indexes are critical.
Indexing can be very complex and if you are interested in the subject, I suggest you get a good book on performance tuning your particular database and read in depth about them.
An index that's never used is a waste of disk space, as well as adding to the insert/update/delete time. It's probably best to define the clustering index first, then define
additional indexes as you find yourself writing WHERE clauses.
One common index mistake I see is people wondering why a select on col2 (or col3) takes so long when the index is defined as col1 ASC, col2 ASC, col3 ASC. When you have a multiple column index, your WHERE clause must use the first column in the index, or the first and second column in the index, and so forth.
If you need to access the data by col2, then you need an additional index that's defined as col2 ASC.
With small domain tables, it's sometimes faster to do a table scan than it is to read rows from the table using an index. This depends on the speed of your database machine and the speed of the network.
You need indexes. Only with indexes you can access data fast enough.
To make it as short as possible:
add indexes for columns you are frequently filtering (or grouping) for. (eg. a state or name)
like and sql functions could make the DBMS not use indexes.
add indexes only on columns which have many different values (eg. no boolean fields)
It is common to add indexes to foreign keys, but it is not always needed.
don't add indexes in very short tables
never add indexes when you don't know how they should enhance performance.
Finally: look into execution plans to decide how to optimize queries.
You'll add indexes just for a single, critical query. In this case, you'll add exactly the indexes that are needed in the query in question (multi-column indexes).
Basically when DB is collecting data and it's alive indexes have to go and evolve with that flow. There maybe really good index on table but after growing beyond of XXX records the same index in the same table is useless and in that case it should be refactored.
To have optimized and fast DB the only way is to monitor it all the time and refactor it over the time as records come in.
Real life example i got some time ago was super fast query restricted by some time range (created_at between A and B) and super slow query where time range was different. Same query, same database, same application and only one difference on time range.
Always use clustered indexes.
In fact you can't help but using them. The data in a table will be laid out on disk in some particular order anyway, it can't be save as a pile or something. You have the chance of specifying how exactly this data will be laid out. Why burn it?
When you have a table which gets new records appended and you observe that some value in those records always grow (like StackOverflow question number), make a clustered index out of it. Then the new data will not be inserted in the middle but will basically be appended to a file on disk which is a relatively cheap operation.
If a table is expected to be the target of a join then it is best to have a clustered index on that table so that the joins can be performed sequentially through the data pages. The columns in the clustered index will (on some DB systems) be included in all of the other indexes on that table, since those are the values that the indexes will use to reference the table data. To keep the other indexes from getting too large, the columns in the clustered index should be as narrow as possible, so it is best to use only numeric—rather than character—data types in the clustered index. In general, fewer columns are better than more columns, but notice that three int columns (12 bytes per row) are much better than one nvarchar(32) column (potentially 64 bytes per row).
If the clustered index is narrow, then a few additional indexes should not negatively impact performance very much even on very large tables.
Seems you are confusing two concepts here.
Adding indices *generally can only make a read query faster, very very rarely (almost never) slower. Adding an index never forces the query optimizer to use it. It will only use it if it thinks it can benefit from it, and it is generally very smart about those decisions.
For inserts/updates, of course, every index hurts performance a bit more... But at the other end of the spectrum, for, say a read only database, (like a USPS address database which is distributed monthly), in operational use there would ne no inserts/updates, so the only negative impact of additional indices is the disk space they take up.
This is entirely different that specifying that the query optimizer USE an index, in effect overriding what it would do on it's own... That can potentially make a query slower.
EDIT: Edited to eliminate opportunity for misinterpretation by overly literal readers.

Should searchable date fields in a database table always be indexed?

If I have a field in a table of some date type and I know that I will always be searching it using comparisons like between, > or < and never = could there be a good reason not to add an index for it?
The only reason not to add an index on a field you are going to search on is that the cost of maintaining the index overweights its benefits.
This may happen if:
You have a really tough DML on your table
The existence of the index makes it intolerably slow, and
It's more important to have fast DML than the fast queries.
If it's not the case, then just create the index. The optimizer just won't use it if it thinks it's not needed.
There are far more bad reasons.
However, an index on the search column may not be enough if the index is nonclustered and non-covering. Queries like this are often good candidates for clustered indexes, however a covering index is just as good.
This is a great example of why this is as much art as science. Some considerations:
How often is data added to this table? If there is far more reading/searching than adding/changing (the whole point of some tables to dump data into for reporting), then you want to go crazy with indexes. You clustered index might be needed more for the ID field, but you can have plenty of multi-column indexes (where the date fields comes later, with columns listed earlier in the index do a good job of reducing the result set), and covered indexes (where all returned values are in the index, so it's very fast, like you're searching on the clustered index to begin with).
If the table is edited/added to often, or you have limited storage space and hence can't have tons of indexes, then you have to be more careful with your indexes. If your date criteria typically gives a wide range of data, and you don't search often on other fields, then you could give a clustered index over to this date field, but think several times before you do that. You clustered index being on a simple autonumber field is a bonus for all you indexes. Non-covered indexes use the clustered index to zip to the records for the result set. Don't move the clustered index to a date field unless the vast majority of your searching is on that date field. It's the nuclear option.
If you can't have a lot of covered indexes (data changes a lot on the table, there's limited space, your result sets are large and varied), and/or you really need the clustered index for another column, and the typical date criteria gives a wide range of records, and you have to search a lot, you've got problems. If you can dump data to a reporting table, do that. If you can't, then you'll have to balance all these competing factors carefully. Maybe for the top 2-3 searches you minimize the result-set columns as much as you can configure covered indexes, and you let the rest make due with a simple non -clustered index
You can see why good db people should be paid well. I know a lot of the factors, but I envy people to can balance all these things quickly and correctly without having to do a lot of profiling.
Don't index it IF you want to scan the entire table every time. I would want the database to try and do a range scan, so I'd add the index, but I use SQL Server and it will use the index in most cases. However different databases many not use the index.
Depending on the data, I'd go further than that, and suggest it could be a clustered index if you're going to be doing BETWEEN queries, to avoid the table scan.
While an index helps for querying the table, it will also slow down inserts, updates and deletes somewhat. If you have a lot more changes in the table than queries, an index can hurt the overall performance.
If the table is small it might never use the indexes therefore adding them may just be wasting resources.
There are datatypes (like image in SQL Server) and data distributions where indexes are unlikely to be used or can't be used. For instance in SQL Server, it is pointless to index a bit field as there is not enough variability in the data for an index to do any good.
If you usually query with a like clause and a wildcard as the first character, no index will be used, so creating one is another waste of reseources.

When should you consider indexing your sql tables?

How many records should there be before I consider indexing my sql tables?
There's no good reason to forego obvious indexes (FKs, etc.) when you're creating the table. It will never noticeably affect performance to have unnecessary indexes on tiny tables, and it's good to take a first cut when your mind is into schema design. Also, some indexes serve to prevent duplicates, which can be useful regardless of table size.
I guess the proper answer to your question is that the number of records in the table should have nothing to do with when to create indexes.
I would create the index entries when I create my table. If you decide to create indices after the table has grown to 100, 1000, 100000 entries it can just take alot of time and perhaps make your database unavailable while you are doing it.
Think about the table first, create the indices you think you'll need, and then move on.
In some cases you will discover that you should have indexed a column, if thats the case, fix it when you discover it.
Creating an index on a searched field is not a pre-optimization, its just what should be done.
When the query time is unacceptable. Better yet, create a few indexes now that are likely to be useful, and run an EXPLAIN or EXPLAIN ANALYZE on your queries once your database is populated by representative data. If the indexes aren't helping, drop them. If there are slow queries that could benefit from more or different indexes, change the indexes.
You are not going to be locked in to an initial choice of indexes. Experiment, and make sure you measure performance!
In general I agree with the previous advice.
Always declare the referential integrity for the tables (Primary Key, Foreign Keys), column constraints (not null, check). Saves you from nightmares when apps put bad data into the tables (even in development).
I'd consider adding indexes for the common access columns (columns in your where clauses which are used in =, <> tests), as well.
Most of the modern RDBMS implementations are quite good at keeping you indexes up to date, without hitting your performance. So, the cost of having indexes is minimal.
Also, most RDBMS's have query plan evaluators which look at the relative costs going to the data rows via the index, or using some sort of table scan. So, again the performance hits are minimal.
Two.
I'm serious. If there are two rows now, and there will always be two rows, the cost of indexing is almost zero. It's quicker to index than to ponder whether you should. It won't take the optimizer very long to figure out that scanning the table is quicker than using the index.
If there are two rows now, but there will be 200,000 in the near future, the cost of not indexing could become prohibitively high. The right time to consider indexing is now.
Having said this, remember that you get an index automatically when you declare a primary key. Creating a table with no primary key is asking for trouble in most cases. So the only time you really need to consider indexing is when you want an index other than the index on the primary key. You need to know the traffic, and the anticipated volume to make this call. If you get it wrong, you'll know, and you can reverse the decision.
I once saw a reference table that had been created with no index when it contained 20 rows. Due to a business change, this table had grown to about 900 rows, but the person who should have noticed the absence of an index didn't. The time to insert a new order had grown from about 10 seconds to 15 minutes.
As a matter of routine I perform the following on read heavy tables:
Create indexes on common join fields such as Foreign Keys when I create the table.
Check the query plan for Views or Stored Procedures and add indexes wherever a table scan is indicated.
Check the query plan for queries by my application and add indexes wherever a table scan is indicated. (and often try to make them into Stored Procedures)
On write heavy tables (like activity logs) I avoid indexes unless they are absolutely necessary. I also tend to archive such data into indexed tables at regular intervals.
It depends.
How much data is in the table? How often is data inserted? A lot of indexes can slow down insertion time. Do you always query all the rows of the table? In this case indexes probably won't help much.
Those aren't common usages though. In most cases, you know you're going to be querying a subset of data. ON what fields? Are there common fields that are always joined on? Look at query plans for common or typical queries, it will generally show you where it's spending all of its time.
If there's a unique constraint on the table (and there should be at least one), then that will usually be enforced by a unique index.
Otherwise, you add indexes when the query performance is bad and adding the index will demonstrably improve the performance. There are books on the subject of how to create good sets of indexes on tables, including Relational Database Index Design and the Optimizers. It will give you a lot of ideas and the reasons why they are good.
See also:
No indexes on small tables
When to create a new SQL Server index
Best Practices and Anti-Patterns in Creating Indexes
and, no doubt, a host of others.