So I am curious to know if it is worth creating a clustered index on a heap table that has about 30M rows of data. Before now, it wasn't going to be used in any application that we have but now we are creating an app to query that table.
The reason why I ask if it is worth it is because the application we are creating is basically doing this type of query.
SELECT *
FROM [table];
I am leaving the * in to represent that we are basically pulling all fields.
So my question is, is it worth creating a clustered index on a table that does not have one even though we are going to be selecting all fields and rows for our application?
Thanks for any info/advice.
No it is not worth it. If you are going to run a select without a where clause, a clustered index will just add more data to the Page files, depending on what you choose for your index(It all really depends on your data). Creating a larger scan of the table. A Heap table is the actual better performance wise in many situations(if you are just getting all rows from a table and not using joins/wheres/filter clauses of some sort), because it is stored in less page files.
Having a clustered index, when it isnt used will also bear some overhead in updating/creating stats on a table and doing inserts (page splits)
So if you arent going to use the index, and aren't going to filter on your table you are better off without the index
Related
Let's say I have table with many columns (20 for instance), and I often do the search by one of them. If I create non-clustered index for that column, then I know I should also include other columns from select statement to cover the query.
But what if the query is SELECT *, should I include all columns to index? I know I am making a copy of the whole table by doing that, is it good or bad practice?
Indexing the most / whole table is not usually a good idea, especially if there are inserts / updates / deletes to the table. When all the wanted fields are not included in the index, a key lookup must be made using the clustered index to find the row(s) from the table. How good / bad that is depends on how many rows you're fetching and how many levels there are in the clustered index -- and that's why it's good to have a narrow clustering key, preferably an int.
If you have to do key lookups for significant portion of the rows in the table, it's usually a lot faster just to scan the whole table. That is most likely the case in your scenario too, because doing key lookups isn't going to be that expensive, if there's not much rows affected, so indexing all fields wouldn't really help.
Of course if your table is huge, indexing all the columns might help, at least in theory. I haven't ever even considered doing that, but I would assume it would help when scanning the whole table would be a costly operation. This of course only in case that the table doesn't get much updates, because maintaining the index would cause problems too.
I understand that we create indexes in order to facilitate look-ups and retrieval of data from the disk especially that data is located in many blocks. Let us suppose that we have a table stored in our database and that table is already sorted based on some criteria. Is it worth it to create an index on that table so that the retrieval is even more faster?
I will simplify a bit, but it should still get the point across. A table can only be physically sorted according to one key, called the clustered index, usually the same as the primary key. If you need to do lookups on columns other than those contained the clustered index, the data will not be sorted and there is the potential for the need for a full table scan on the clustered index. If your table is large enough and you do a lot of queries that involve columns other than the clustered index, then you will need to create additional indexes on the other columns.
As always, actually measure the results to see if it matters and also make sure to look at execution plans to see if it makes a difference. In some cases, it doesn't matter.
Finally, indexes will slow down insert and update operations, as the indexes will need to be updated in addition to the regular table data. You will thus need to consider the types of operations that frequently happen on your table. If inserts are infrequent, but reads are frequent, then indexes will help. If you're mostly inserting data and rarely reading it, don't bother with the indexes.
I'd appreciate some advice from SQL Server gurus here. Let me explain...
I have an SQL Server 2008 database table that has 21 columns. Here's a quick type of those:
INT Primary Key
Several other INT's that are indexes already (used to reference this and other tables)
Several NVARCHAR(64) to hold user-provided text
Several NVARCHAR(256) to hold longer user-provided text
Several DATETIME2
One BIGINT
Several UNIQUEIDENTIFIER, one is already an index
The way this table is used is that it is presented to a user as a sortable table and a user can choose which column to sort it by. This table may contain many thousands of records (like currently it does 21,000 and it will be growing.)
So my question is, do I need to set each column as an INDEX to enable faster sorting?
PS. Forgot to say. The output obviously supports pagination, so the user sees no more than 100 rows at once.
Contrary to popular belief, just having an index on a column does not guarantee that any queries will be any faster!
If you constantly use SELECT *.. from that table, these non-clustered indices on a single column will most likely not be used at all.
A good nonclustered index is a covering index, which means, it contains all the necessary columns to satisfy one or multiple given queries. If you have this situation, then a nonclustered index can make sense - otherwise, in more cases than not, the nonclustered index is likely to be ignored by the query optimizer. The reason for this being: if you need all the columns anyway, the query would have to do key lookups from the nonclustered index into the actual data (the clustered index) for each row found - and the key lookup is a very expensive operation, so doing this for a lots of hits becomes overly costly, and the query optimizer will rather quickly switch to a index scan (possibly the clustered index scan) to fetch the data.
Don't over-index - use a well-designed clustered index, put indices on the foreign key columns to speed up joins - and then let it be for the time being. Observe your system, measure performance, maybe add an index here or there - but don't just overload the system with tons of indices!
Having too many indices can be worse than having none - every index must be maintained, e.g. updated for each INSERT, UPDATE and DELETE statement - does that take time!
this table is ... presented to a user as a sortable table ... [that] may contain many thousands of records
If you're ordering many thousands of records for display, you're doing it wrong. Typical users can reasonably process at most around 500 typical records. Exceptional users can handle a couple thousand. Any more than that, and you're misleading your users into a false sense that they've seen a representative sample. This results in poor decision making and inefficient user workflow. Instead, you need to focus on a good search algorithm.
Another to keep in mind here is that more indexes means slower inserts and updates. It's a balancing act. Sql Server keeps statistics on what queries and sorts it actually performs, and makes those statistics available to you. There are queries you can run that tell you exactly what indexes Sql Server thinks it could use. I would deploy without any sorting index and let it run for a week or two that way. Then look at data and see what users actually sort on and index just those columns.
Take a look at this link for an example and introduction on finding missing indexes:
http://sqlserverpedia.com/wiki/Find_Missing_Indexes
Generally indexes use to accelerate WHERE conditions (in some cases JOINS). so I don't thinks create index on column except PRIMARY KEY accelerate sorting. you can do your sorting in clients(if you use win forms or wpf) or in database for web scenarios
Good Luck
If I have a field in a table of some date type and I know that I will always be searching it using comparisons like between, > or < and never = could there be a good reason not to add an index for it?
The only reason not to add an index on a field you are going to search on is that the cost of maintaining the index overweights its benefits.
This may happen if:
You have a really tough DML on your table
The existence of the index makes it intolerably slow, and
It's more important to have fast DML than the fast queries.
If it's not the case, then just create the index. The optimizer just won't use it if it thinks it's not needed.
There are far more bad reasons.
However, an index on the search column may not be enough if the index is nonclustered and non-covering. Queries like this are often good candidates for clustered indexes, however a covering index is just as good.
This is a great example of why this is as much art as science. Some considerations:
How often is data added to this table? If there is far more reading/searching than adding/changing (the whole point of some tables to dump data into for reporting), then you want to go crazy with indexes. You clustered index might be needed more for the ID field, but you can have plenty of multi-column indexes (where the date fields comes later, with columns listed earlier in the index do a good job of reducing the result set), and covered indexes (where all returned values are in the index, so it's very fast, like you're searching on the clustered index to begin with).
If the table is edited/added to often, or you have limited storage space and hence can't have tons of indexes, then you have to be more careful with your indexes. If your date criteria typically gives a wide range of data, and you don't search often on other fields, then you could give a clustered index over to this date field, but think several times before you do that. You clustered index being on a simple autonumber field is a bonus for all you indexes. Non-covered indexes use the clustered index to zip to the records for the result set. Don't move the clustered index to a date field unless the vast majority of your searching is on that date field. It's the nuclear option.
If you can't have a lot of covered indexes (data changes a lot on the table, there's limited space, your result sets are large and varied), and/or you really need the clustered index for another column, and the typical date criteria gives a wide range of records, and you have to search a lot, you've got problems. If you can dump data to a reporting table, do that. If you can't, then you'll have to balance all these competing factors carefully. Maybe for the top 2-3 searches you minimize the result-set columns as much as you can configure covered indexes, and you let the rest make due with a simple non -clustered index
You can see why good db people should be paid well. I know a lot of the factors, but I envy people to can balance all these things quickly and correctly without having to do a lot of profiling.
Don't index it IF you want to scan the entire table every time. I would want the database to try and do a range scan, so I'd add the index, but I use SQL Server and it will use the index in most cases. However different databases many not use the index.
Depending on the data, I'd go further than that, and suggest it could be a clustered index if you're going to be doing BETWEEN queries, to avoid the table scan.
While an index helps for querying the table, it will also slow down inserts, updates and deletes somewhat. If you have a lot more changes in the table than queries, an index can hurt the overall performance.
If the table is small it might never use the indexes therefore adding them may just be wasting resources.
There are datatypes (like image in SQL Server) and data distributions where indexes are unlikely to be used or can't be used. For instance in SQL Server, it is pointless to index a bit field as there is not enough variability in the data for an index to do any good.
If you usually query with a like clause and a wildcard as the first character, no index will be used, so creating one is another waste of reseources.
I have a table with many millions of rows. I need to find all the rows with a specific column value. That column is not in an index, so a table scan results.
But would it be quicker to add an index with the column at the head (prime key following), do the query, then drop the index?
I can't add an index permanently as the user is nominating what column they're looking for.
Two questions to think about:
How many columns could be nominated for the query?
Does the data change frequently? A lot of it?
If you have a small number of candidate columns, and the data doesn't change a lot, then you might want to consider adding a permanent index on any or even all candidate column.
"Blasphemy!", I hear. Most sources tell you to "never" index every column of a table, but that advised is rooted on the generic assumption that tables are modified frequently.
You will pay a price in additional storage, as well as a performance hit when the data changes.
How small is small and how much is a lot, and is the tradeoff worth it?
There is no way to tell a priory because "too slow" is usually a subjective measurement.
You will have to try it, measure the size of your indexes and then the effect they have in the searches. You will have to balance the costs against the increase in satisfaction of your customers.
[Added] Oh, one more thing: temporary indexes are not only physically slower than a table scan, but they would destroy your concurrency. Re-indexing a table usually (always?) requires a full table lock, so in effect only one user search could be done at a time.
Good luck.
I'm no DBA, but I would guess that building the index would require scanning the table anyway.
Unless there are going to be multiple queries on that column, I would recommend not creating the index.
Best to check the explain plans/execution times for both ways, though!
As everyone else has said, it most certainly would not be faster to add an index than it would be to do a full scan of that column.
However, I would suggest tracking the query pattern and find out which column(s) are searched for the most, and add indexes at least for them. You may find out that 3-4 indexes speeds up 90% of your queries.
Adding an index requires a table scan, so if you can't add a permanent index it sounds like a single scan will be (slightly) faster.
No, that would not be quicker. What would be quicker is to just add the index and leave it there!
Of course, it may not be practical to index every column, but then again it may. How is data added to the table?
It wouldn't be. Creating an index is more complex than simply scanning the column, even if the computational complexity is the same.
That said - how many columns do you have? Are you sure you can't just create an index for each of them if the query time for a single find is too long?
It depends on the complexity of your query. If you're retrieving the data once, then doing a table scan is faster. However, if you're going back to the table more than once for related information in the same query, then the index is faster.
Another related strategy is to do the table scan, and put all the data in a temporary table. Then index THAT and then you can do all your subsequent selects, groupings, and as many other queries on the subset of indexed data. The benefit being that looking up related information in related tables using the temp table is MUCH faster.
However, space is cheap these days, so you'd probably best be served by examining how your users actually USE your system and adding indexes on those frequent columns. I have yet to see users use ALL the search parameters ALL the time.
Your solution will not scale unless you add a permanent index to each column, with all of the columns that are returned in the query in the list of included columns (a covering index). These indexes will be very large, and inserts and updates to that table will be a bit slower, but you don't have much of a choice if you are allowing a user to arbitrarily select a search column.
How many columns are there? How often does the data get updated? How fast do inserts and updates need to run? There are trade-offs involved, depending on the answers to those questions. Do plenty of experimentation and testing so you know for sure how things will perform.
But to your original question, adding and dropping an index for the purpose of a single query is only beneficial if you do more than one select during the query (for example, the select is in a sub-query that gets run for each row returned).