How to build an index that is best for this SQL? - sql

I'm querying database by a simple SQL like:
SELECT DISTINCT label FROM Document WHERE folderId=123
I need a best index for it.
There are two way to implement this.
(1) create two indexes
CREATE INDEX .. ON Document (label)
CREATE INDEX .. ON Document (folderId)
(2) create a composite index:
CREATE INDEX .. ON Document (folderId, label)
According to the implementation of the database index. Which method is more reasonable? Thank you.

Your second index -- the composite index -- is the best index for this query:
SELECT DISTINCT label
FROM Document
WHERE folderId = 123;
First, the index covers the query so the data pages do not need to be accessed (in most databases).
The index works because the engine can seek to the records with the identified folderId. The information on label can be pulled. Most databases should use the index for pulling the distinct labels as well.

The answer depends also on the used dbms. I'm not sure if all dbms support the use of multiple indexes.
I should think the combined index (folderid, label) is the best solution in your case as it is possible to build your result set solely by using index data. So it doesn't even need to access the real data.
It can even be a strategy to add extra columns to an index so a query which is called a lot can be answered by only accessing the index.

Related

One column index vs 2 column index

I want to execute the next delete query:
delete from MyTable where userId = 5
What of the below indexed will have better performance on this query, or will they both perform the same?
All the mentioned fields in here are BigInt
CREATE INDEX MyTable_UserId_UserBalance_index ON Main.dbo.MyTable (UserId, UserBalance);
CREATE INDEX MyTable_UserId_index ON Main.dbo.MyTable (UserId);
For that particular query they should perform roughly the same. Since you're looking up the first item in the compound index, finding the records should be the same as if there were another index on the single column.
I'm surprised at many of the answers here - for the purpose of deletion, there won't be any benefit using either one of these indexes over the other. For if the record exists in either index, you'll need to find it and remove it.
The purpose of indexes is for reads, not deleting data. If you're trying to read data, you would ask a question like this one, as one index has the potential to return the data quicker than the other. For deletes, you need to delete from all indexes, including the NC indexes.
It seems as if some enlightenment into the world of indexes is being called for.
Some great (free) documentation from well-known DBA - Brent Ozar: https://www.brentozar.com/archive/2016/10/think-like-engine-class-now-free-open-source/
CREATE INDEX MyTable_UserId_index ON Main.dbo.MyTable (UserId);
Both the indexes would work the same way unless the order is reversed for below index.
From:
CREATE INDEX MyTable_UserId_UserBalance_index ON Main.dbo.MyTable (UserId, UserBalance);
To:
CREATE INDEX MyTable_UserId_UserBalance_index ON Main.dbo.MyTable (UserBalance, UserId);
In second case, server may not see this index as useful as UserId is at second level.
Also, why would you create two indexes with same column in it? If you know that your table will be queried frequently with both UserId and UserBalance then, probably it is best to create an index with both columns in it.
Again, just make sure which column gets utilized the most.

How to use index in SQL query

Well i am new to this stuff ..I have created an index in my SP at start like follows
Create Index index_fab
ON TblFab (Fab_name)
Now i have query under this
select fab_name from TblFab where artc = 'x' and atelr = 'y'.
now Is it necessary to use this index name in select clause or it will automatically used to speed up queries
Do i have to use something like
select fab_name from TblFab WITH(INDEX(index_fab)) where artc = 'x' and atelr = 'y'.
or any other method to use this index in query
and also how to use index if we are using join on this table?
Firstly, do you mean you're creating the index in a stored procedure? That's a bad idea - if you run the stored procedure twice, it will fail because the index already exists.
Secondly, your query doesn't use the column mentioned in the index, so it will have no impact.
Thirdly, as JodyT writes, the query analyzer (SQL Server itself) will decide which index to use; it's almost certainly better at it than you are.
Finally, to speed up the query you mention, create an index on columns artc and atelr.
The Query Optimizer of SQL Server will decide if it the index is suitable for the query. You can't force it to use a specific index. You can give hints on which you want it to use but it won't be a guarantee that it will use it.
As the other people answered your question to help you to understand better, my opinion is, you should first understand why you need to use indexes. As we know that indexes increase the performance , they could also cause performance issues as well. Its better to know when you need to use indexes, why you need to use indexes instead of how to use indexes.
You can read almost every little detail from here .
Regarding your example, your query's index has no impact. Because it doesn't have the mentioned column in your query's where clause.
You can also try:
CREATE INDEX yourIndexName
ON yourTableName (column_you_are_looking_for1,column_you_are_lookingfor2)
Also good to know: If no index exists on a table, a table scan must be performed for each table referenced in a database query. The larger the table, the longer a table scan takes because a table scan requires each table row to be accessed sequentially. Although a table scan might be more efficient for a complex query that requires most of the rows in a table, for a query that returns only some table rows an index scan can access table rows more efficiently. (source from here )
Hope this helps.
An index should be used by default if you run a query against the table using it.
But I think in the query you posted it will not be used, because you are not filtering your data by the column you created your index on.
I think you would have to create the index for the artc and atelr columns to profit from that.
To see wether your index is used take a look at the execution plan that was used in the SQL Management Studio.
more info on indices: use the index luke
You dont need to include index in your query. Its managed by sql server. Also you dont need to include index in select if you want to make join to this table. Hope its clear.
You're index use "Fab_name" column which you don't filter on in your select statement, so it's of no use.
Since you're new to this, you might benefit from an index like this :
Create Index index_fab
ON TblFab (artc, atelr)
or maybe like this
Create Index index_fab
ON TblFab (atelr, artc)
...yes there are a lot of subtleties to learn.
For better performance:
List out the columns /tables which are frequently used,
Create index on those tables/columns only.
If index is properly set up, optimizer will use it automatically. By properly set up, I mean that it's selective enough, can effectively help the query etc. Read about it. You can check by yourself if index is being used by using "include actual execution plan" option in ssms.
It's generally not advised to use with(index()) hints and let optimizer decided by itself, except from very special cases when you just know better ;).

How to use index in select statement?

Lets say in the employee table, I have created an index(idx_name) on the emp_name column of the table.
Do I need to explicitly specify the index name in select clause or it will automatically used to speed up queries.
If it is required to be specified in the select clause, What is the syntax for using index in select query ?
If you want to test the index to see if it works, here is the syntax:
SELECT *
FROM Table WITH(INDEX(Index_Name))
The WITH statement will force the index to be used.
Good question,
Usually the DB engine should automatically select the index to use based on query execution plans it builds. However, there are some pretty rare cases when you want to force the DB to use a specific index.
To be able to answer your specific question you have to specify the DB you are using.
For MySQL, you want to read the Index Hint Syntax documentation on how to do this
How to use index in select statement? this way:
SELECT * FROM table1 USE INDEX (col1_index,col2_index)
WHERE col1=1 AND col2=2 AND col3=3;
SELECT * FROM table1 IGNORE INDEX (col3_index)
WHERE col1=1 AND col2=2 AND col3=3;
SELECT * FROM t1 USE INDEX (i1) IGNORE INDEX (i2) USE INDEX (i2);
And many more ways check this
Do I need to explicitly specify?
No, no Need to specify explicitly.
DB engine should automatically select the index to use based on query execution plans it builds from #Tudor Constantin answer.
The optimiser will judge if the use of your index will make your query run faster, and if it is, it will use the index. from #niktrl answer
In general, the index will be used if the assumed cost of using the index, and then possibly having to perform further bookmark lookups is lower than the cost of just scanning the entire table.
If your query is of the form:
SELECT Name from Table where Name = 'Boris'
And 1 row out of 1000 has the name Boris, it will almost certainly be used. If everyone's name is Boris, it will probably resort to a table scan, since the index is unlikely to be a more efficient strategy to access the data.
If it's a wide table (lot's of columns) and you do:
SELECT * from Table where Name = 'Boris'
Then it may still choose to perform the table scan, if it's a reasonable assumption that it's going to take more time retrieving the other columns from the table than it will to just look up the name, or again, if it's likely to be retrieving a lot of rows anyway.
The optimiser will judge if the use of your index will make your query run faster, and if it is, it will use the index.
Depending on your RDBMS you can force the use of an index, although it is not recommended unless you know what you are doing.
In general you should index columns that you use in table join's and where statements
Generally, when you create an index on a table, database will automatically use that index while searching for data in that table. You don't need to do anything about that.
However, in MSSQL, you can specify an index hint which can specify that a particular index should be used to execute this query. More information about this can be found here.
Index hint is also seems to be available for MySQL. Thanks to Tudor Constantine.
By using the column that the index is applied to within your conditions, it will be included automatically. You do not have to use it, but it will speed up queries when it is used.
SELECT * FROM TABLE WHERE attribute = 'value'
Will use the appropriate index.
The index hint is only available for Microsoft Dynamics database servers.
For traditional SQL Server, the filters you define in your 'Where' clause should persuade the engine to use any relevant indices...
Provided the engine's execution plan can efficiently identify how to read the information (whether a full table scan or an indexed scan) - it must compare the two before executing the statement proper, as part of its built-in performance optimiser.
However, you can force the optimiser to scan by using something like
Select *
From [yourtable] With (Index(0))
Where ...
Or to seek a particular index by using something like
Select *
From [yourtable] With (Index(1))
Where ...
The choice is yours. Look at the table's index properties in the object panel to get an idea of which index you want to use. It ought to match your filter(s).
For best results, list the filters which would return the fewest results first.
I don't know if I'm right in saying, but it seems like the query filters are sequential; if you get your sequence right, the optimiser shouldn't have to do it for you by comparing all the combinations, or at least not begin the comparison with the more expensive queries.

Why can't I simply add an index that includes all columns?

I have a table in SQL Server database which I want to be able to search and retrieve data from as fast as possible. I don't care about how long time it takes to insert into the table, I am only interested in the speed at which I can get data.
The problem is the table is accessed with 20 or more different types of queries. This makes it a tedious task to add an index specially designed for each query. I'm considering instead simply adding an index that includes ALL columns of the table. It's not something you would normally do in "good" database design, so I'm assuming there is some good reason why I shouldn't do it.
Can anyone tell me why I shouldn't do this?
UPDATE: I forgot to mention, I also don't care about the size of my database. It's OK that it means my database size will grow larger than it needed to
First of all, an index in SQL Server can only have at most 900 bytes in its index entry. That alone makes it impossible to have an index with all columns.
Most of all: such an index makes no sense at all. What are you trying to achieve??
Consider this: if you have an index on (LastName, FirstName, Street, City), that index will not be able to be used to speed up queries on
FirstName alone
City
Street
That index would be useful for searches on
(LastName), or
(LastName, FirstName), or
(LastName, FirstName, Street), or
(LastName, FirstName, Street, City)
but really nothing else - certainly not if you search for just Street or just City!
The order of the columns in your index makes quite a difference, and the query optimizer can't just use any column somewhere in the middle of an index for lookups.
Consider your phone book: it's order probably by LastName, FirstName, maybe Street. So does that indexing help you find all "Joe's" in your city? All people living on "Main Street" ?? No - you can lookup by LastName first - then you get more specific inside that set of data. Just having an index over everything doesn't help speed up searching for all columns at all.
If you want to be able to search by Street - you need to add a separate index on (Street) (and possibly another column or two that make sense).
If you want to be able to search by Occupation or whatever else - you need another specific index for that.
Just because your column exists in an index doesn't mean that'll speed up all searches for that column!
The main rule is: use as few indices as possible - too many indices can be even worse for a system than having no indices at all.... build your system, monitor its performance, and find those queries that cost the most - then optimize these, e.g. by adding indices.
Don't just blindly index every column just because you can - this is a guarantee for lousy system performance - any index also requires maintenance and upkeep, so the more indices you have, the more your INSERT, UPDATE and DELETE operations will suffer (get slower) since all those indices need to be updated.
You are having a fundamental misunderstanding how indexes work.
Read this explanation "how multi-column indexes work".
The next question you might have is why not creating one index per column--but that's also a dead-end if you try to reach top select performance.
You might feel that it is a tedious task, but I would say it's a required task to index carefully. Sloppy indexing strikes back, as in this example.
Note: I am strongly convinced that proper indexing pays off and I know that many people are having the very same questions you have. That's why I'm writing a the a free book about it. The links above refer the pages that might help you to answer your question. However, you might also want to read it from the beginning.
...if you add an index that contains all columns, and a query was actually able to use that index, it would scan it in the order of the primary key. Which means hitting nearly every record. Average search time would be O(n/2).. the same as hitting the actual database.
You need to read a bit lot about indexes.
It might help if you consider an index on a table to be a bit like a Dictionary in C#.
var nameIndex = new Dictionary<String, List<int>>();
That means that the name column is indexed, and will return a list of primary keys.
var nameOccupationIndex = new Dictionary<String, List<Dictionary<String, List<int>>>>();
That means that the name column + occupation columns are indexed. Now imagine the index contained 10 different columns, nested so far deep it contains every single row in your table.
This isn't exactly how it works mind you. But it should give you an idea of how indexes could work if implemented in C#. What you need to do is create indexes based on one or two keys that are queried on extensively, so that the index is more useful than scanning the entire table.
If this is a data warehouse type operation where queries are highly optimized for READ queries, and if you have 20 ways of dissecting the data, e.g.
WHERE clause involves..
Q1: status, type, customer
Q2: price, customer, band
Q3: sale_month, band, type, status
Q4: customer
etc
And you absolutely have plenty of fast storage space to burn, then by all means create an index for EVERY single column, separately. So a 20-column table will have 20 indexes, one for each individual column. I could probably say to ignore bit columns or low cardinality columns, but since we're going so far, why bother (with that admonition). They will just sit there and churn the WRITE time, but if you don't care about that part of the picture, then we're all good.
Analyze your 20 queries, and if you have hot queries (the hottest ones) that still won't go any faster, plan it using SSMS (press Ctrl-L) with one query in the query window. It will tell you what index can help that queries - just create it; create them all, fully remembering that this adds again to the write cost, backup file size, db maintenance time etc.
I think the questioner is asking
'why can't I make an index like':
create index index_name
on table_name
(
*
)
The problems with that have been addressed.
But given it sounds like they are using MS sql server.
It's useful to understand that you can include nonkey columns in an index so they the values of those columns are available for retrieval from the index, but not to be used as selection criteria :
create index index_name
on table_name
(
foreign_key
)
include (a,b,c,d) -- every column except foreign key
I created two tables with a million identical rows
I indexed table A like this
create nonclustered index index_name_A
on A
(
foreign_key -- this is a guid
)
and table B like this
create nonclustered index index_name_B
on B
(
foreign_key -- this is a guid
)
include (id,a,b,c,d) -- ( every key except foreign key)
no surprise, table A was slightly faster to insert to.
but when I and ran these this queries
select * from A where foreign_key = #guid
select * from B where foreign_key = #guid
On table A, sql server didn't even use the index, it did a table scan, and complained about a missing index including id,a,b,c,d
On table B, the query was over 50 times faster with much less io
forcing the query on A to use the index didn't make it any faster
select * from A where foreign_key = #guid
select * from A with (index(index_name_A)) where foreign_key = #guid
I'm considering instead simply adding an index that includes ALL columns of the table.
This is always a bad idea. Indexes in database is not some sort of pixie dust that works magically. You have to analyze your queries and according to what and how is being queried - append indexes.
It is not as simple as "add everything to index and have a nap"
I see only long and complicated answers here so I thought I should give the simplest answer possible.
You cannot add an entire table, or all its columns, to an index because that just duplicates the table.
In simple terms, an index is just another table with selected data ordered in the order you normally expect to query it in, and a pointer to the row on disk where the rest of the data lives.
So, a level of indirection exists. You have a partial copy of a table in an preordered manner (both on disk and in RAM, assuming the index is not fragmented), which is faster to query for the columns defined in the index only, while the rest of the columns can be fetched without having to scan the disk for them, because the index contains a reference to the correct position on disk where the rest of the data is for each row.
1) size, an index essentially builds a copy of the data in that column some easily searchable structure, like a binary tree (I don't know SQL Server specifcs).
2) You mentioned speed, index structures are slower to add to.
That index would just be identical to your table (possibly sorted in another order).
It won't speed up your queries.

RDBS when to use complex indexes for queries and when use simple?

Suppose I have table in my DB schema called TEST with fields (id, name, address, phone, comments). Now, I know that I'm going to perform a large set of different queries for that table, therefore my question is next, when and why I shall create indexes like ID_NAME_INDX (index for id and name) and when it's more efficient to create separately index for id and index for name field(by when I mean for what type of query)?
The general aim would be to "cover" all columns so the query only has to use the index.
-- An index on Name including ID would be ideal
SELECT
[id]
FROM
TEST
WHERE
[name] = 'bob'
Say you need name and indx but have separate indexes. You'll end up with a bookmark lookup from the index to the PK to get the other columns (assuming it doesn't just scan the PK)
Edit, after 1st comment:
select * from test where id='id1' and name='Name1'
For this query, the SELECT * but mitigates against any index so the PK would be used.
If you had:
select address from test where id='id1' and name='Name1'
then an index on ID, name including address would "cover" it.
Using "OR" creates difficulties for any strategy. However,
select address from test where id='id1' and name='Name1'
would still use the "ID, name including address" inex most likely but scan it rather that seek
Read this: Execution Plan Basics
I'm not sure your example explains the actual question you're asking. You're saying if you should have an index on ID and and index on Name, as opposed to an index on both ID and Name. The thing is, I guess that ID is your primary key and so you're not likely to do a search on ID AND Name.
However, in the terms of a table with two ID's of which you would want to search on either one, or both together then having three indexes, one on each of the ID's and one combined will be the fastest. If you have two indexes then to find the record you're looking for both indexes will need to be searched. However, if you have one index covering both ID's then only that index will need to be searched.
As with all indexes though, as you add them, your database increases in size and you will get a reduction on insert / update performance. You always need to weigh up the gains / losses.
Add indexes to the absolutely obvious candidates, add indexes to the "maybe" ones as the need arises. Continue to monitor your database performance and run query analysers to see where any performance gains can be made over time.
Most database software include some sort of a tool to debug your queries. These can usually tell you which indexes the server considered and which it ended up using. This functionality is usually called explain or something similar.
Usually you should create indexes for columns that are used in the where clause or joins.