My question is about limitation of clustered index on a table.
By theory, in a single table we can have only one cluster index. But what if I have datetime columns in a table say "From date" and "To date"? These columns will often required in WHERE clause to populate reports in my application. And if I also require a cluster index on primary key in the same table, then still how to get advantage of cluster index on other columns? In this case my queries will still run slower with larger records.
In practice, you also can have just a single clustered index on a table - since the table's data is physically ordered by that clustered index.
If you need two datetime columns frequently in WHERE clauses, the best choice would be to have a non-clustered index on those two columns, and possibly include additional columns that you frequently retrieve with those queries, in order to make it a covering index.
There's really not much difference between a good, covering non-clustered index, and the clustered index, in terms of query performance.
However, you don't want to bloat your clustered index, since those columns will also be added to all non-clustered indices on the same table - keep it small, preferably an INT, ever-increasing, stable (not changing) and you should be just fine.
Another option is indexed (or materialized) views: you could create multiple views on the table, each with a different clustered index. That might be useful in a reporting scenario, but indexed views have lots of restrictions and will affect the performance of queries that modify the table data. Books Online has all the information you need to create and test them.
I suspect your real requirement is indeed to implement a reporting solution, and if so then it might be best to do it properly: create a separate database with a schema optimized for reporting (Google "star schema") and load data regularly from the main database into the reporting one. But that's a whole new area of development to investigate, and I wouldn't rush into it.
If you need the performance of a cluster index table for multiple indexes of the same table, the only route I see is holding a copy of the table for each cluster index.
The clustered index effects the physical storage of the data in a table, so by definition there can only be one. You can widen your clustered index to include the other columns, but this can have its own disadvantages.
The performance advantage from a clustered index is that the records are stored in a manner that reflects the index (which is why random inserts and updates to the clustered index very quickly fragment the table), and therefore query performance based on this index can be as good as performance as reads from your storage device, you can't get this on other indexes.
I suggest that you choose the index from which you would derive the most benefit from clustering and make that your clustered index. Make the rest of the indexes non-clustered. You may want to run some tests to find out what benefits will be derived from making different indexes clustered vs. non-clustered.
Share and enjoy.
Related
I understand that we create indexes in order to facilitate look-ups and retrieval of data from the disk especially that data is located in many blocks. Let us suppose that we have a table stored in our database and that table is already sorted based on some criteria. Is it worth it to create an index on that table so that the retrieval is even more faster?
I will simplify a bit, but it should still get the point across. A table can only be physically sorted according to one key, called the clustered index, usually the same as the primary key. If you need to do lookups on columns other than those contained the clustered index, the data will not be sorted and there is the potential for the need for a full table scan on the clustered index. If your table is large enough and you do a lot of queries that involve columns other than the clustered index, then you will need to create additional indexes on the other columns.
As always, actually measure the results to see if it matters and also make sure to look at execution plans to see if it makes a difference. In some cases, it doesn't matter.
Finally, indexes will slow down insert and update operations, as the indexes will need to be updated in addition to the regular table data. You will thus need to consider the types of operations that frequently happen on your table. If inserts are infrequent, but reads are frequent, then indexes will help. If you're mostly inserting data and rarely reading it, don't bother with the indexes.
Also, when is it appropriate to use one?
An index is used to speed up searching in the database. MySQL has some good documentation on the subject (which is relevant for other SQL servers as well):
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
An index can be used to efficiently find all rows matching some column in your query and then walk through only that subset of the table to find exact matches. If you don't have indexes on any column in the WHERE clause, the SQL server has to walk through the whole table and check every row to see if it matches, which may be a slow operation on big tables.
The index can also be a UNIQUE index, which means that you cannot have duplicate values in that column, or a PRIMARY KEY which in some storage engines defines where in the database file the value is stored.
In MySQL you can use EXPLAIN in front of your SELECT statement to see if your query will make use of any index. This is a good start for troubleshooting performance problems. Read more here:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
A clustered index is like the contents of a phone book. You can open the book at 'Hilditch, David' and find all the information for all of the 'Hilditch's right next to each other. Here the keys for the clustered index are (lastname, firstname).
This makes clustered indexes great for retrieving lots of data based on range based queries since all the data is located next to each other.
Since the clustered index is actually related to how the data is stored, there is only one of them possible per table (although you can cheat to simulate multiple clustered indexes).
A non-clustered index is different in that you can have many of them and they then point at the data in the clustered index. You could have e.g. a non-clustered index at the back of a phone book which is keyed on (town, address)
Imagine if you had to search through the phone book for all the people who live in 'London' - with only the clustered index you would have to search every single item in the phone book since the key on the clustered index is on (lastname, firstname) and as a result the people living in London are scattered randomly throughout the index.
If you have a non-clustered index on (town) then these queries can be performed much more quickly.
An index is used to speed up the performance of queries. It does this by reducing the number of database data pages that have to be visited/scanned.
In SQL Server, a clustered index determines the physical order of data in a table. There can be only one clustered index per table (the clustered index IS the table). All other indexes on a table are termed non-clustered.
SQL Server Index Basics
SQL Server Indexes: The Basics
SQL Server Indexes
Index Basics
Index (wiki)
Indexes are all about finding data quickly.
Indexes in a database are analogous to indexes that you find in a book. If a book has an index, and I ask you to find a chapter in that book, you can quickly find that with the help of the index. On the other hand, if the book does not have an index, you will have to spend more time looking for the chapter by looking at every page from the start to the end of the book.
In a similar fashion, indexes in a database can help queries find data quickly. If you are new to indexes, the following videos, can be very useful. In fact, I have learned a lot from them.
Index Basics
Clustered and Non-Clustered Indexes
Unique and Non-Unique Indexes
Advantages and disadvantages of indexes
Well in general index is a B-tree. There are two types of indexes: clustered and nonclustered.
Clustered index creates a physical order of rows (it can be only one and in most cases it is also a primary key - if you create primary key on table you create clustered index on this table also).
Nonclustered index is also a binary tree but it doesn't create a physical order of rows. So the leaf nodes of nonclustered index contain PK (if it exists) or row index.
Indexes are used to increase the speed of search. Because the complexity is of O(log N). Indexes is very large and interesting topic. I can say that creating indexes on large database is some kind of art sometimes.
INDEXES - to find data easily
UNIQUE INDEX - duplicate values are not allowed
Syntax for INDEX
CREATE INDEX INDEX_NAME ON TABLE_NAME(COLUMN);
Syntax for UNIQUE INDEX
CREATE UNIQUE INDEX INDEX_NAME ON TABLE_NAME(COLUMN);
First we need to understand how normal (without indexing) query runs. It basically traverse each rows one by one and when it finds the data it returns. Refer the following image. (This image has been taken from this video.)
So suppose query is to find 50 , it will have to read 49 records as a linear search.
Refer the following image. (This image has been taken from this video)
When we apply indexing, the query will quickly find out the data without reading each one of them just by eliminating half of the data in each traversal like a binary search. The mysql indexes are stored as B-tree where all the data are in leaf node.
INDEX is a performance optimization technique that speeds up the data retrieval process. It is a persistent data structure that is associated with a Table (or View) in order to increase performance during retrieving the data from that table (or View).
Index based search is applied more particularly when your queries include WHERE filter. Otherwise, i.e, a query without WHERE-filter selects whole data and process. Searching whole table without INDEX is called Table-scan.
You will find exact information for Sql-Indexes in clear and reliable way:
follow these links:
For cocnept-wise understanding:
http://dotnetauthorities.blogspot.in/2013/12/Microsoft-SQL-Server-Training-Online-Learning-Classes-INDEX-Overview-and-Optimizations.html
For implementation-wise understanding:
http://dotnetauthorities.blogspot.in/2013/12/Microsoft-SQL-Server-Training-Online-Learning-Classes-INDEX-Creation-Deletetion-Optimizations.html
If you're using SQL Server, one of the best resources is its own Books Online that comes with the install! It's the 1st place I would refer to for ANY SQL Server related topics.
If it's practical "how should I do this?" kind of questions, then StackOverflow would be a better place to ask.
Also, I haven't been back for a while but sqlservercentral.com used to be one of the top SQL Server related sites out there.
An index is used for several different reasons. The main reason is to speed up querying so that you can get rows or sort rows faster. Another reason is to define a primary-key or unique index which will guarantee that no other columns have the same values.
So, How indexing actually works?
Well, first off, the database table does not reorder itself when we put index on a column to optimize the query performance.
An index is a data structure, (most commonly its B-tree {Its balanced tree, not binary tree}) that stores the value for a specific column in a table.
The major advantage of B-tree is that the data in it is sortable. Along with it, B-Tree data structure is time efficient and operations such as searching, insertion, deletion can be done in logarithmic time.
So the index would look like this -
Here for each column, it would be mapped with a database internal identifier (pointer) which points to the exact location of the row. And, now if we run the same query.
Visual Representation of the Query execution
So, indexing just cuts down the time complexity from o(n) to o(log n).
A detailed info - https://pankajtanwar.in/blog/what-is-the-sorting-algorithm-behind-order-by-query-in-mysql
INDEX is not part of SQL. INDEX creates a Balanced Tree on physical level to accelerate CRUD.
SQL is a language which describe the Conceptual Level Schema and External Level Schema. SQL doesn't describe Physical Level Schema.
The statement which creates an INDEX is defined by DBMS, not by SQL standard.
An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.
Indexes are automatically created when PRIMARY KEY and UNIQUE constraints are defined on table columns. For example, when you create a table with a UNIQUE constraint, Database Engine automatically creates a nonclustered index.
If you configure a PRIMARY KEY, Database Engine automatically creates a clustered index, unless a clustered index already exists. When you try to enforce a PRIMARY KEY constraint on an existing table and a clustered index already exists on that table, SQL Server enforces the primary key using a nonclustered index.
Please refer to this for more information about indexes (clustered and non clustered):
https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15
Hope this helps!
Here is the scenario:
SQL Server 2000 (8.0.2055)
Table currently has 478 million rows of data. The Primary Key column is an INT with IDENTITY. There is an Unique Constraint imposed on two other columns with a Non-Clustered Index. This is a vendor application and we are only responsible for maintaining the DB.
Now the vendor has recommended doing the following "to improve performance"
Drop the PK and Clustered Index
Drop the non-clustered index on the two columns with the UNIQUE CONSTRAINT
Recreate the PK, with a NON-CLUSTERED index
Create a CLUSTERED index on the two columns with the UNIQUE CONSTRAINT
I am not convinced that this is the right thing to do. I have a number of concerns.
By dropping the PK and indexes, you will be creating a heap with 478 million rows of data. Then creating a CLUSTERED INDEX on two columns would be a really mammoth task. Would creating another table with the same structure and new indexing scheme and then copying the data over, dropping the old table and renaming the new one be a better approach?
I am also not sure how the stored procs will react. Will they continue using the cached execution plan, considering that they are not being explicitly recompiled.
I am simply not able to understand what kind of "performance improvement" this change will provide. I think that this will actually have the reverse effect.
All thoughts welcome.
Thanks in advance,
Raj
All stored procs will recompile automatically. This will happen anyway when stats change and after index maintenance anyway.
At some point, you have to reorganise 478 million rows (drop/create indexes) or move (new table). Neither way is better then the other, unfortunately. I feel your pain though: we have similarly large tables with pending new columns and an index changes.
Saying that, you should do it step 2-1-4-3 to avoid unnecessary non-clustered index maintenance when you drop/create the clustered index.
And drop the unique constraint. The clustered index could be unique and clustered. A unique constrint is just another index that would be unnecessary.
As for the performance benefit, perhaps ask the vendor why.
The one thing I would have a serious look at is the question what type those other two columns are - how big are they, compared to the INT IDENTITY (4 byte) ??
The reason I ask: the clustering key will be added to all non-clustered indices on the table, too - and if you have close to 500 million rows, it will make a huge difference whether the clustering key is a single 4-byte INT, or e.g. two 16-byte GUID's.
This is not only on disk, mind you - the pages are loaded into SQL Server's RAM in their entirety - so by potentially bloating up your clustering key, you'd incur performance penalties due to the larger number of pages on disk (and in RAM) that your non-clustered indices would need.
The only compelling reason I could see to actually go through with those changes would be if by clustering the table using those two other columns, you'd gain something in terms of query performance, e.g. if some of the most frequent queries would be faster, due to the fact that the table is now clustered by these two columns. That's really hard to know unless you know what the access and query patterns really are....
Would creating another table with the same structure and new indexing scheme and then copying the data over, dropping the old table and renaming the new one be a better approach?
I believe that this is what SQL Enterprise Manager will do behind the scenes anyways if you use the visual tools to do this. If you make a schema change such as add a column in the middle of a table, or change primary keys, there is a little button that will allow you to "Script Changes". If you view this script, you can see the steps that Enterprise Manager will take in order to do what you requested.
can i specify more then one column as Non-cluster index, what will it affect?
Like other people have said, this will have an overhead in terms of maintaining the table, it's just a case of whether the benefits outweigh the cost.
The only reasons I can think of for doing adding a multiple-column non-clustered index are to speed up queries where you regularly search based on a combination of fields, or to make a combination of fields unique in the table. If those are your aims then generally I'd say go for it.
Assuming you are asking about how to define multiple indexes, you can define as many non-clustered indexes as you want. It's the clustered index that you can only have one of, since it determines how rows are clumped together as records on disk.
The more indexes you create, the longer it will take to perform insert, update and delete operations.
Yes, you can have multiple columns in an index, be it clustered or non-clustered. It is very common. It's effect depends entirely on the data in your table.
I have inherited a database where there are clustered indexes and additional duplicate indexes for each of the clustered index.
i.e
IX_PrimaryKey is a clustered index on the column ID.
IX_ID is a non clustered index on the column ID.
I want to clean up these duplicate non clustered indexes and I wanted to check to see if anyone could think of a reason to do this.
Can anyone think of a performance benefit for doing this?
For exact same indexes, there's no performance gain. Actually, it incurs performance loss in insertion and updates. However, if there are multicolumn indexes with different column order, there might be a valid reason for them.
Maybe I'm not thinking hard enough, but I can't see any reason to do this; the nature of the clustered index is that the data is organized in the order of the index. It seems that the extra index is a complete waste.
Digging through BOL and watching this question, though ...
There seems no sensible reason for doing this, and there is a performance hit.
The only thing I could think of to do this is to create an index with an incredibly narrow row width so that the rows per page was very high, making it very quick to scan / seek. But since it contains no other fields (except the clustered key, which is the same value) I still cannot see a reason for it.
It's quite possible the original creator was not aware that the PK was defaulting to a clustered index and created an NC index without realising it was a duplicate.
I presume what would have happened is that SQL Server would have automatically created clustered index when a primary key constraint was specified (this would happen if another index (non-clustered/clustered) is not present already) and then some one might have created a non-clustered index for the primary key column.
Such a scenario would:
Have some adverse effect on performance as indexes are updated when inserts/deletes/updates happen.
Use additional disk space.
Might lead to deadlocks.
Would contribute to more time in backup/restore of database.
cheers
It will be a waste to create a clustered primary key. Unless you have query that search for records using WHERE ID = 10 ?
You may want to create a clustered index on the column which will be frequently queried on WHERE City = 'Sydney'. Clustered means that SQL will group the data in the table based on the clustered index. By grouping the City values in the table means SQL can search for data quicker.
Storing two indexes over the same data is a waste of disk space and the processing needed to maintain the data.
However, I can imagine a product which depends on the existence of an index named IX_PrimaryKey. E.G.
string queryPattern = "select * from {0} as t with (index(IX_PrimaryKey))";
You can make the argument that the clustered index itself occupies much less space than the others, since the leaf is the actual data. On the other hand, the clustered index can be more susceptible to page splitting, and some indexes are better non-clustered.
Putting this together, I can definitely think of scenarios where removing the duplicate indexes would be a Bad Thing:
Code like above which depends on a known index name.
Code which can alter the clustered index to any of the non-clustered indexes.
Code which uses the presence/absence of IX_PrimaryKey to treat the table in a certain way.
I don't consider any of these good design, but I can definitely imagine someone doing it. (Have you posted this to DailyWTF?)
There are cases where it makes sense to have overlapping indexes which are not identical:
create index IX_1 on table1 (ID)
create index IX_2 on table1 (ID, TYPE, ORDER_DATE, TOTAL_CHARGES)
If you are looking up strictly by ID, SQL can optimize and use IX_1. If you are running a query based on ID, TYPE, ORDER_DATE and summing up TOTAL_CHARGES, SQL can use IX_2 as a "covering index", satisfying all the query details from the index without ever touching the table. Generally this is something you add in the course of performance tuning, after extensive testing.
Looking at your given example of two indexes on exactly the same field, I don't see a great fit. Perhaps SQL can use IX_ID as a "covering index" when checking for the existence of a value and bypass some blocking on IX_PrimaryKey?