How to use index in SQL query - sql

Well i am new to this stuff ..I have created an index in my SP at start like follows
Create Index index_fab
ON TblFab (Fab_name)
Now i have query under this
select fab_name from TblFab where artc = 'x' and atelr = 'y'.
now Is it necessary to use this index name in select clause or it will automatically used to speed up queries
Do i have to use something like
select fab_name from TblFab WITH(INDEX(index_fab)) where artc = 'x' and atelr = 'y'.
or any other method to use this index in query
and also how to use index if we are using join on this table?

Firstly, do you mean you're creating the index in a stored procedure? That's a bad idea - if you run the stored procedure twice, it will fail because the index already exists.
Secondly, your query doesn't use the column mentioned in the index, so it will have no impact.
Thirdly, as JodyT writes, the query analyzer (SQL Server itself) will decide which index to use; it's almost certainly better at it than you are.
Finally, to speed up the query you mention, create an index on columns artc and atelr.

The Query Optimizer of SQL Server will decide if it the index is suitable for the query. You can't force it to use a specific index. You can give hints on which you want it to use but it won't be a guarantee that it will use it.

As the other people answered your question to help you to understand better, my opinion is, you should first understand why you need to use indexes. As we know that indexes increase the performance , they could also cause performance issues as well. Its better to know when you need to use indexes, why you need to use indexes instead of how to use indexes.
You can read almost every little detail from here .
Regarding your example, your query's index has no impact. Because it doesn't have the mentioned column in your query's where clause.
You can also try:
CREATE INDEX yourIndexName
ON yourTableName (column_you_are_looking_for1,column_you_are_lookingfor2)
Also good to know: If no index exists on a table, a table scan must be performed for each table referenced in a database query. The larger the table, the longer a table scan takes because a table scan requires each table row to be accessed sequentially. Although a table scan might be more efficient for a complex query that requires most of the rows in a table, for a query that returns only some table rows an index scan can access table rows more efficiently. (source from here )
Hope this helps.

An index should be used by default if you run a query against the table using it.
But I think in the query you posted it will not be used, because you are not filtering your data by the column you created your index on.
I think you would have to create the index for the artc and atelr columns to profit from that.
To see wether your index is used take a look at the execution plan that was used in the SQL Management Studio.
more info on indices: use the index luke

You dont need to include index in your query. Its managed by sql server. Also you dont need to include index in select if you want to make join to this table. Hope its clear.

You're index use "Fab_name" column which you don't filter on in your select statement, so it's of no use.
Since you're new to this, you might benefit from an index like this :
Create Index index_fab
ON TblFab (artc, atelr)
or maybe like this
Create Index index_fab
ON TblFab (atelr, artc)
...yes there are a lot of subtleties to learn.

For better performance:
List out the columns /tables which are frequently used,
Create index on those tables/columns only.

If index is properly set up, optimizer will use it automatically. By properly set up, I mean that it's selective enough, can effectively help the query etc. Read about it. You can check by yourself if index is being used by using "include actual execution plan" option in ssms.
It's generally not advised to use with(index()) hints and let optimizer decided by itself, except from very special cases when you just know better ;).

Related

SQL question on indexes. How to call the index that I have created

How can I call the indexes that I have created for a table in SQL?
For example I have created an index:
CREATE INDEX idx_pname ON Persons (LastName, FirstName);
How can I call it?
Indexes are background data structures, which you cannot call directly. They will be internally used by the query, for getting data, if it finds the index to be good way to get the data in the optimal manner.
If you want to force the query optimizer to leverage an index, you can use index hint. Below is an example.
SELECT FirstName,LastName FROM Persons WITH (INDEX = idx_pname )
But, I would always suggest not to use index hint, as it can bring in lots of issues.
When the statistics change also, forcing an index will lead to performance issues
If the index is dropped, the query will not work
But, if you want to see what indexes are present on a table, you can use sp_help to see the table information.
EXEC sp_help 'dbo.Persons'
Also, you can use sp_helpindex
EXEC sp_helpindex 'dbo.Persons'
More on sp_helpindex
Thanks for the answers. I have understood it. So, basically the database uses indexes if they fell they are required to call. This in order useful for query search optimization. We will simply create an index if we feel that this is called more or it is will be used more frequently.
Indexes defined on fields belonging to database table are part of database schema and indexes are defined with purpose to speed up performance of data retrieval operations from database tables as indexed fields are looked up into specific sort order according to index definition and speeds up process as search stops once matched record is found. So as a part of query execution plan indexes are applied according to select query definition.
To determine whether indexes are used into select operation explain statement is used
EXPLAIN SELECT statement

Why "IN " query tag is so costly in sql stored procedures?

How can I improve my performance issue? I have an sql query with 'IN' I guess 'IN' making some costly performance issue. But I need index my sql query?
My sql query:
SELECT [p].[ReferencedxxxId]
FROM [Common].[xxxReference] AS [p]
WHERE ([p].[IsDeleted] = 0)
AND (([p].[ReferencedxyzType] = #__refxyzType_0)
AND [p].[ReferencedxxxId] IN ('42342','ffsdfd','5345345345'))
My solution: (BUT I NEED YOUR HELP FOR BETTER ADVISE) Whichone is correct clustered or nonclustred index?
USE [xxx]
GO
CREATE NONCLUSTERED INDEX IX_NonClusteredIndexDemo_xxxId
ON [Common].[xxxReference](xxxId)
INCLUDE ([ID],[ReferencedxxxId])
WITH (DROP_EXISTING=ON, ONLINE=ON, FILLFACTOR=90)
GO
Second:
CREATE INDEX xxxReference_ReferencedxxxId_index
ON [Common].[xxxReference] (ReferencedxxxId)[/code]
Whichone is correct or do you have better solution?
The performance problem of this query is not the result of using the IN operator.
This operator performs very well with small lists (say, less than 1000 members).
The performance bottle neck here is the fact that SQL Server performs an index scan instead of an index seek (which is very costly), and the key lookup, which is 20% of the query cost.
To avoid both problems, you can add an index on IsDeleted, ReferencedxyzType and ReferencedxxxId - probably in this exact order.
SQL Performance tuning is a science that tends to look a little like art or magic - either way you look at it it requires a good knowledge of both the theory and practice of index settings and the relevant systems requirements.
Therefor, my suggestion is this: Do not attempt to solve it yourself with the help of strangers on the internet. Get an expert for a consulting job for a couple of hours/days to analyze the system and help you fine-tune it.
Learn whatever you can during this process. Ask questions about everything that is not trivial. This will be money well spent.
Couple of things:
If you have a SELECT statement inside the IN, that should be avoided
and should be replaced with an EXISTS clause. But in your above
example, that is not relevant as you have direct values inside IN.
Using EXISTS and NOT EXISTS instead of IN and NOT IN helps SQL
Server to not needing to scan each value of the column for each
values inside the IN / NOT IN and rather can short circuit the
search once a match or non-match found.
Avoid the implicit conversion. They degrade the performance due to
many reasons including i> SQL Server not able to find proper
statistics on an index and hence not able to leverage an index and
would rather go make use of a clustered index available in the table
(which may not be covering your query), ii> Not assigning proper
required RAM during memory allocation phase of the query by storage
engine, iii> Cardinality estimation becomes wrong as SQL Server
would not have statistics on the computed value of that column, and
rather probably had statistics on that column.
If you look at your execution plan posted above, you will see a
yellow mark in your 'SELECT'. If you hover over it, you will see
one/more warning messages. If your warning is related to implicit
conversion, try to use proper datatypes during comparison.
Eg. What is the datatype of the column '[ReferencedxxxId]'? If it
is not an NVARCHAR and is rather a VARCHAR, then I would suggest:
Make the values inside the IN as VARCHAR (currently you are making them NVARCHAR). This way you will still be able to take full advantage of the rowstore index created on [ReferencedxxxId] column.
If you must have the values as NVARCHAR inside the IN clause, then you should:
CONVERT/CAST the column [ReferencedxxxId] in your IN clause. This is going to get rid of the Implicit conversion but you will no longer be able to take full advantage of the rowstore index on [ReferencedxxxId] column.
+
Rather create a clustered/nonclustered columnstore index on the table covering the columns used in the query. This should significantly enhance the performance of your SELECT query.
If you decided to go with the route of using rowstore index by correcting the values inside the IN, you need to make sure that you create a clustered/nonclustered index which covers the query. Meaning the index covers the columns on which you are doing search ([ReferencedxxxId], [ReferencedxxxType], [IsDeleted]) and then including the columns used in SELECT statement under INCLUDE clause (if it is a nonclustered index)
Also, when you are creating a composite rowstore index, try to keep the order of columns inside the index high cardinality to low cardinality from left to right to make the best use of that index.
On the basis of assuming an OLTP based system and not OLAP, my first pass would be an NC Index - given isDeleted is likely to have the least selectivity, I would place it last, first pass would be an NC index ReferencedxyzType, ReferencedxxxId, IsDeleted
I might even be tempted in a higher volume scenario to move the IsDeleted out of the index onto an include instead, since it provides so little selectivity to the index itself.
There is clearly already a clustered index in place on the table (from the query plan we can see it), we don't have the details of what is in it.
The question around clustered vs non-clustered is more complex and requires a lot more knowledge of the system and usage.

How to use index in select statement?

Lets say in the employee table, I have created an index(idx_name) on the emp_name column of the table.
Do I need to explicitly specify the index name in select clause or it will automatically used to speed up queries.
If it is required to be specified in the select clause, What is the syntax for using index in select query ?
If you want to test the index to see if it works, here is the syntax:
SELECT *
FROM Table WITH(INDEX(Index_Name))
The WITH statement will force the index to be used.
Good question,
Usually the DB engine should automatically select the index to use based on query execution plans it builds. However, there are some pretty rare cases when you want to force the DB to use a specific index.
To be able to answer your specific question you have to specify the DB you are using.
For MySQL, you want to read the Index Hint Syntax documentation on how to do this
How to use index in select statement? this way:
SELECT * FROM table1 USE INDEX (col1_index,col2_index)
WHERE col1=1 AND col2=2 AND col3=3;
SELECT * FROM table1 IGNORE INDEX (col3_index)
WHERE col1=1 AND col2=2 AND col3=3;
SELECT * FROM t1 USE INDEX (i1) IGNORE INDEX (i2) USE INDEX (i2);
And many more ways check this
Do I need to explicitly specify?
No, no Need to specify explicitly.
DB engine should automatically select the index to use based on query execution plans it builds from #Tudor Constantin answer.
The optimiser will judge if the use of your index will make your query run faster, and if it is, it will use the index. from #niktrl answer
In general, the index will be used if the assumed cost of using the index, and then possibly having to perform further bookmark lookups is lower than the cost of just scanning the entire table.
If your query is of the form:
SELECT Name from Table where Name = 'Boris'
And 1 row out of 1000 has the name Boris, it will almost certainly be used. If everyone's name is Boris, it will probably resort to a table scan, since the index is unlikely to be a more efficient strategy to access the data.
If it's a wide table (lot's of columns) and you do:
SELECT * from Table where Name = 'Boris'
Then it may still choose to perform the table scan, if it's a reasonable assumption that it's going to take more time retrieving the other columns from the table than it will to just look up the name, or again, if it's likely to be retrieving a lot of rows anyway.
The optimiser will judge if the use of your index will make your query run faster, and if it is, it will use the index.
Depending on your RDBMS you can force the use of an index, although it is not recommended unless you know what you are doing.
In general you should index columns that you use in table join's and where statements
Generally, when you create an index on a table, database will automatically use that index while searching for data in that table. You don't need to do anything about that.
However, in MSSQL, you can specify an index hint which can specify that a particular index should be used to execute this query. More information about this can be found here.
Index hint is also seems to be available for MySQL. Thanks to Tudor Constantine.
By using the column that the index is applied to within your conditions, it will be included automatically. You do not have to use it, but it will speed up queries when it is used.
SELECT * FROM TABLE WHERE attribute = 'value'
Will use the appropriate index.
The index hint is only available for Microsoft Dynamics database servers.
For traditional SQL Server, the filters you define in your 'Where' clause should persuade the engine to use any relevant indices...
Provided the engine's execution plan can efficiently identify how to read the information (whether a full table scan or an indexed scan) - it must compare the two before executing the statement proper, as part of its built-in performance optimiser.
However, you can force the optimiser to scan by using something like
Select *
From [yourtable] With (Index(0))
Where ...
Or to seek a particular index by using something like
Select *
From [yourtable] With (Index(1))
Where ...
The choice is yours. Look at the table's index properties in the object panel to get an idea of which index you want to use. It ought to match your filter(s).
For best results, list the filters which would return the fewest results first.
I don't know if I'm right in saying, but it seems like the query filters are sequential; if you get your sequence right, the optimiser shouldn't have to do it for you by comparing all the combinations, or at least not begin the comparison with the more expensive queries.

Tuning table select SQL having a RAW column in Oracle 10g

I have a table with several columns and a unique RAW column. I created an unique index on the RAW column.
My query selects all columns from the table (6 million rows).
when i see the cost of the query its too high (51K). and its still using INDEX FULL scan. The query do not have any filter conditions, its a plain select * from.
Please suggest how can i tune the query operation.
Thanks in advance.
Why are you hinting it to use the index if you're retrieving all columns from all rows? The index would only help if you were filtering on the indexed column. If you were only retrieving the indexed column then an INDEX_FFS hint might help. But if you have to go back to the data for any non-indexed columns then using the index at all becomes counterproductive beyond a certain proportion of returned data as you're having to access both the index data blocks and the table data blocks repeatedly.
So, your query is:
select /*+ index (rawdata idx_test) */
rawdata.*
from v_wis_cds_cp_rawdata_test rawdata
and you want to know why Oracle is choosing an INDEX FULL scan?
Well, as Alex said, the reason is the "index (raw data idx_text)" hint. This is a directive that tells the Oracle optimizer, "when you access rawdata, use an index access on the idx_text index", which means that's what Oracle will do if at all possible - even if that's not the best plan.
Hints don't make queries faster automatically. They are a way of telling the optimizer what not to do.
I've seen queries like this before - sometimes a hint like this is added in order to return the rows in sorted order, without actually doing a sort. However, if this was the requirement, I'd strongly recommend adding an ORDER BY clause in anyway, because if the hint becomes invalid for some reason (e.g. the index gets dropped or renamed), the sorting would no longer happen and no error would be reported.
If you don't need the rows returned in any particular order, I suggest you remove the hint and see if the performance improves.

Do indexes work with "IN" clause

If I have a query like:
Select EmployeeId
From Employee
Where EmployeeTypeId IN (1,2,3)
and I have an index on the EmployeeTypeId field, does SQL server still use that index?
Yeah, that's right. If your Employee table has 10,000 records, and only 5 records have EmployeeTypeId in (1,2,3), then it will most likely use the index to fetch the records. However, if it finds that 9,000 records have the EmployeeTypeId in (1,2,3), then it would most likely just do a table scan to get the corresponding EmployeeIds, as it's faster just to run through the whole table than to go to each branch of the index tree and look at the records individually.
SQL Server does a lot of stuff to try and optimize how the queries run. However, sometimes it doesn't get the right answer. If you know that SQL Server isn't using the index, by looking at the execution plan in query analyzer, you can tell the query engine to use a specific index with the following change to your query.
SELECT EmployeeId FROM Employee WITH (Index(Index_EmployeeTypeId )) WHERE EmployeeTypeId IN (1,2,3)
Assuming the index you have on the EmployeeTypeId field is named Index_EmployeeTypeId.
Usually it would, unless the IN clause covers too much of the table, and then it will do a table scan. Best way to find out in your specific case would be to run it in the query analyzer, and check out the execution plan.
Unless technology has improved in ways I can't imagine of late, the "IN" query shown will produce a result that's effectively the OR-ing of three result sets, one for each of the values in the "IN" list. The IN clause becomes an equality condition for each of the list and will use an index if appropriate. In the case of unique IDs and a large enough table then I'd expect the optimiser to use an index.
If the items in the list were to be non-unique however, and I guess in the example that a "TypeId" is a foreign key, then I'm more interested in the distribution. I'm wondering if the optimiser will check the stats for each value in the list? Say it checks the first value and finds it's in 20% of the rows (of a large enough table to matter). It'll probably table scan. But will the same query plan be used for the other two, even if they're unique?
It's probably moot - something like an Employee table is likely to be small enough that it will stay cached in memory and you probably wouldn't notice a difference between that and indexed retrieval anyway.
And lastly, while I'm preaching, beware the query in the IN clause: it's often a quick way to get something working and (for me at least) can be a good way to express the requirement, but it's almost always better restated as a join. Your optimiser may be smart enough to spot this, but then again it may not. If you don't currently performance-check against production data volumes, do so - in these days of cost-based optimisation you can't be certain of the query plan until you have a full load and representative statistics. If you can't, then be prepared for surprises in production...
So there's the potential for an "IN" clause to run a table scan, but the optimizer will
try and work out the best way to deal with it?
Whether an index is used doesn't so much vary on the type of query as much of the type and distribution of data in the table(s), how up-to-date your table statistics are, and the actual datatype of the column.
The other posters are correct that an index will be used over a table scan if:
The query won't access more than a certain percent of the rows indexed (say ~10% but should vary between DBMS's).
Alternatively, if there are a lot of rows, but relatively few unique values in the column, it also may be faster to do a table scan.
The other variable that might not be that obvious is making sure that the datatypes of the values being compared are the same. In PostgreSQL, I don't think that indexes will be used if you're filtering on a float but your column is made up of ints. There are also some operators that don't support index use (again, in PostgreSQL, the ILIKE operator is like this).
As noted though, always check the query analyser when in doubt and your DBMS's documentation is your friend.
#Mike: Thanks for the detailed analysis. There are definately some interesting points you make there. The example I posted is somewhat trivial but the basis of the question came from using NHibernate.
With NHibernate, you can write a clause like this:
int[] employeeIds = new int[]{1, 5, 23463, 32523};
NHibernateSession.CreateCriteria(typeof(Employee))
.Add(Restrictions.InG("EmployeeId",employeeIds))
NHibernate then generates a query which looks like
select * from employee where employeeid in (1, 5, 23463, 32523)
So as you and others have pointed out, it looks like there are going to be times where an index will be used or a table scan will happen, but you can't really determine that until runtime.
Select EmployeeId From Employee USE(INDEX(EmployeeTypeId))
This query will search using the index you have created. It works for me. Please do a try..