On what basis indexes are selected

On what basis indexes are selected - sql

I have a question: I have a clustered index on orderid,productid and a non-clustered index on productid. When I am using the following query, it uses the nonclustered index seek on productid, which I expected:
select orderid, productid
from Sales.OrderDetails
where productid =1
order by productid
However, without changing the search arguments, I added the Quantity:
select orderid, productid, qty
from Sales.OrderDetails
where productid =1
order by productid
Now it used a clustered index scan; and when I force use non clustered index (productid) the performance drops.

Every nonclustered index will contain the clustering key in its leaf level nodes. So your nonclustered index on productid really contains productid and orderid in its leaf level nodes.
So your first query can be satisfied by just looking up the value in the nonclustered index - the leaf level node found will contain both columns that your SELECT requires.
This is NOT the case when you add another column, like qty - now, once found, a key lookup into the actual data page is necessary to get all the columns to be returned from your SELECT query. So therefore, maybe now a clustered index scan is performing better than a nonclustered index seek and a key lookup.
I'm pretty sure the second query would use the nonclustered index again once you include the qty column in your index:
CREATE NONCLUSTERED INDEX ix_productId
ON Sales.OrderDetails(productId)
INCLUDE (qty)
because again: now once the leaf-level page in the non-clustered index is found, all necessary columns are present on that page and can be returned to your second query.

It makes sense, when you added the qty column using the clustered index is ideal because the non index (data) columns are located near the clustered indexed attribs.

Index: "The right index will help optimizer finding the best execution plan"
Selectivity: Density of an Index = 1 divided by the no of distinct records.Lower the value of density, higher the chances for Index to be choosen by optimizer, there by higher SELECTIVITY. For every Index, there will be a statistics also, With the following DBCC command, we can find the selectivity of a Statistics and there by the Index.
DBCC SHOW_STATISTICS('table name','statistics name')
It also displays, the Density, of the column when included with Clustered index columns. Lower selectivity will make optimizer ignore the index. We may find low performance query, even after creating index on the search column. The reason could be low selectivity.
Eg:
SELECT sod.OrderQty ,
sod.SalesOrderID ,
sod.SalesOrderDetailID ,
sod.LineTotal
FROM Sales.SalesOrderDetail sod
WHERE sod.OrderQty = 10;
When a where condition is involved, the chances for a seek is high.But in the above query, the execution plan is showing a clustered index Scan. created a non clusterd index on OrderQty, still optimizer is ignoring the index. The reason is, Order quantity density is 1/41 distinct values = ~.25, With such a low selectivity, the optmizser finds it comparable with index scan itself.
Outdated STATISTICS could really make OPTIMIZER ignore the index.
There are different things that a DBA can do to help optimizer find the best execution plan. One amoung those are QUERY HINTS. Introducing Query hits could help, at the same time could be affect the performance, as well. An Index related Query hint is WITH(INDEX())
We can tell optimizer to use a specific index.
eg: SELECT * FROM table WITH (INDEX (0)) -- Can give index number
SELECT * FROM table WITH (INDEX (indexName)) --
These things may help You

Related

Sql server Execution Plans not considering the non clustered index

Below is query we have non clustered index on Questionnaire_Number column but optimizer choosing index scan and it its taking almost 39% of query operations.
SELECT
[Country],
[Site],
[Subject],
[Questionnaire],
[Ques],
[Res],
[ReceivedDt],
[CompletedDt],
[CompletedDtlc],
[Questionnaire],
[RecordUpdaChange],
[ChangeRequesiser],
[RecordUpdated],
[CompletedQuestionnaireId],
[EnrollmentDt],
[AppVersion],
[CompletedBY],
[ModeofEntry],
[StudyDay]
FROM [dbo].Report_PerformTest A
JOIN #Questionnaire_Number Q
ON A.Questionnaire_Number = Q.Questionnaire_Number
ORDER BY CountryName, Site, Subject, CompletedDateTimeLc
Execution Plan
enter image description here
Index Details :
CLUSTERED INDEX : CompletedQuestionnaireId;
CREATE NONCLUSTERED INDEX NIX_Dairy_data_Summary
ON [Report_performTest] SiteID,SubjectNo
INCLUDE All selected columns
CREATE NONCLUSTERED INDEX NIXQues_Dairy_data_Summary
ON [Report_performTest](Questionnaire_Number)
INCLUDE All selected columns
Could please let me know how to avoid index scan on this

Your indexes are not used because the ORDER BY clause specify some columns that are not a part of the "KEY" clause or the INCLUDE clause of your index, like CountryName.

Execution plan showing missing non-clustered index on already partitioned clustered indexes

We have a query where the table is partitioned on column Adate.
Row count: 56595943, partition scheme - yearly, no of partitions - 300
Clustered index columns : empid, Adate
Query :
select top 1 Adate
from emp
where empid = 134556 and Adate <= {ts '7485-09-01 00:00:00.0'}
order by Adate desc
The actual execution plan returns a clustered index seek operation with 93% of the total query cost on clustered index key.
But why is the optimizer recommending a missing index with 92% of cost?
missing index details: Improve query cost:92%
create nonclustered index IDX_NC on dbo.emp([empid], [Adate])
The missing index has an improvement measure of 14755268, as per Microsoft the improvement measure baseline is 1,000,000
Why is this happening? Do you recommend to have a nonclustered index on already clustered index columns?

Well - consider this:
you do have the clustered index on (empid, adate)
the clustered index contains the whole data, e.g. the leaf level pages of the clustered index contain the whole data records (all the columns in your table)
If you are searching and the query uses the clustered index, it might still need to load much more data than is actually needed.... the whole record, as many times as your criteria is found.
If you have a non-clustered index on just (empid, Adate), and your query really only requires Adate (in its SELECT list of columns), then this index will be much smaller - it contains only those two columns (none of the overhead of all the other columns, which are not needed for your current query). So scanning this index, or loading these index pages, will load much less data compared to the clustered index.
From that point of view, yes, even having a nonclustered index on the same columns that make up your clustered index can be beneficial for certain query scenarios - that's probably what the SQL Server query optimizer picks up here.

Optimize the Clustered Index Scan into Clustered Index Seek

There is scenario, I have table with 40 columns and I have to select all data of a table (including all columns). I have created a clustered index on the table and its including Clustered Index Scan while fetching full data set from the table.
I know that without any filter or join key, SQL Server will choose Clustered Index Scan instead of Clustered Index Seek. But, I want to have optimize execution plan by optimizing Clustered Index Scan into Clustered Index Seek. Is there any solution to achieve this? Please share.
Below is the screenshot of the execution plan:

Something is not quite right in the question / request, because what you are asking for will perform badly. I suspect it comes from mis-understanding what a clustered index is.
The clustered index - which is perhaps better stated as a clustered table - is the table of data, its not separate to the table, it is the table. If the order of the data on the table is already based on ITEM ID then the scan is the most efficient access method for your query (especially given the select *) - you do not want to seek in this scenario at all - and I don't believe that it is your scenario due to the sort operator.
If the clustered table is ordered based on another field, then you would need an additional non-clustered index to provide the correct order. You would then try to force a plan which was a non-clustered index scan, nested loop to a clustered index seek. That can be achieved using query hints, most likely an INNER LOOP JOIN would cause the seek - but a FORCESEEK also exists which can be used.
Performance wise this second option is never going to win - you are in effect looking at a tipping point notion (https://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/)

Well, I was trying to achieve the same, I wanted an index seek instead of an index scan on my top query.
SELECT TOP 5 id FROM mytable
Here is the execution plan being shown for the query:
I even tried the Offset Fetch Next approach, the plan was same.
To avoid a index scan, I included a fake primary key filter like below:
SELECT TOP 5 id FROM mytable where id != 0
I know, I won't have a 0 value in my primary key, so I added it in top query, which was resolved to an index seek instead of index scan:
Even though, the query plan comparison gives operation cost as similar to other, for index seek and scan in this regard. But I think to achieve index seek this way, it is an extra operation for the db to perform because it has to compare whether the id is 0 or not. Which we entirely do not need it to do if we want the top few records.

How to create Index for this scenario in SQL Server?

What is the best Index to this Item table for this following query
select
tt.itemlookupcode,
tt.TotalQuantity,
tt.ExtendedPrice,
tt.ExtendedCost,
items.ExtendedDescription,
items.SubDescription1,
dept.Name,
categories.Name,
sup.Code,
sup.SupplierName
from
#temp_tt tt
left join HQMatajer.dbo.Item items
on items.ItemLookupCode=tt.itemlookupcode
left join HQMatajer.dbo.Department dept
ON dept.ID=items.DepartmentID
left join HQMatajer.dbo.Category categories
on categories.ID=items.CategoryID
left join HQMatajer.dbo.Supplier sup
ON sup.ID=items.SupplierID
drop table #temp_tt
I created Index like
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[DBTimeStamp] ASC,
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[Description],
[SubDescription1]
)
But in Execution plan when I check the index which picked another index. That index having only TimeStamp column.
What is the best index for this scenario to that particular table.

First column in index should be part of filtration else Index will not be used. In your index first column is DBTimeStamp and it is not filtered in your query. That is the reason your index is not used.
Also in covering index you have used [Description],[SubDescription1] but in query you have selected ExtendedDescription,items.SubDescription1 this will have additional overhead of key/Rid lookup
Try alerting your index like this
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[ExtendedDescription],
[SubDescription1]
)
Having said that all still optimizer go for scan or choose some other index based on data retrieved from Item table

I'm not surprised your index isn't used. DBTimeStamp is likely to be highly selective, and is not referenced in your query at all.
You might have forgotten to include an ORDER BY clause in your query which was intended reference DBTimeStamp. But even then your query would probably need to scan the entire index. So it may as well scan the actual table.
The only way to make that index 'look enticing' would be to ensure it includes all columns that are used/returned. I.e. You'd need to add ExtendedDescription. The reason this can help is that indexes typically require less storage than the full table. So it's faster to read from disk. But if you're missing columns (in your case ExtendedDescription), then the engine needs to perform an additional lookup onto the full table in any case.
I can't comment why the DBTimeStamp column is preferred - you haven't given enough detail. But perhaps it's the CLUSTERED index?
Your index would be almost certain to be used if defined as:
(
[ItemLookupCode] ASC --The only thing you're actually filtering by
)
INCLUDE (
/* Moving the rest to include is most efficient for the index tree.
And by including ALL used columns, there's no need to perform
extra lookups to the full table.
*/
[DepartmentID],
[CategoryID],
[SupplierID],
[ExtendedDescription],
[SubDescription1]
)
Note however, that this kind of indexing strategy 'Find the best for each query used' is unsustainable.
You're better off finding 'narrower' indexes that are appropriate multiple queries.
Every index slows down INSERT and UPDATE queries.
And indexes like this are impacted by more columns than the preferred 'narrower' indexes.
Index choice should focus on the selectivity of columns. I.e. Given a specific value or small range of values, what percentage of data is likely to be selected based on your queries?
In your case, I'd expect ItemLookupCode to be unique per item in the Items table. In other words indexing by that without any includes should be sufficient. However, since you're joining to a temp table that theoretically could include all item codes: in some cases it might be better to scan the CLUSTERED INDEX in any case.

Optimizing my SQL queries - picking the right indexes

I have a basic table as follows.
create table Orders
(
ID INT IDENTITY(1,1) PRIMARY KEY,
Company VARCHAR(3),
ItemID INT,
BoxID INT,
OrderNum VARCHAR(5),
Status VARCHAR(5),
--about 10 more columns, varchars and ints and dates
)
I'm trying to optimize all my SQL since I am getting a fair few deadlocks and some slowness - but I'm no expert on this sort of thing!
I created a few indexes:
Clustered on the ID (Primary Key).
Non-Clustered index on ([ItemID])
Non-Clustered index on ([BoxID])
Non-Clustered index on ([Company],[OrderNum],[Status])
Maybe 1 or 2 more on some other columns
But I'm not 100% happy with the results.
SELECT * FROM Orders WHERE ItemID=100
Gives me an index seek + a key lookup and a Nested loop (Inner join).
I can see why - but don't know if I should do anything about it. They key lookup is 97% of the batch which seems bad!
Every query used will pull back every column in the table, but I don't like the idea of including every column in the index.
I'm making a change now to query everything on the [Company] field. Every query will be using it, because results should never contain more than 1 value. So they will all change:
SELECT * FROM Orders WHERE ItemID=100 --Old
SELECT * FROM Orders WHERE Company='a' and ItemID=100 --New
But the execution plan of that gives me exactly the same as not including company (which does surprise me!).
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Is it worth adding [Company] to all my indexes since it seems to make
0 different to the execution plan?
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that
mean every query will have 2 seeks?
Is it worth 'including' all other columns in my indexes to avoid the
key lookup? (making the index a tonne bigger, but potentially
speeding it up?) i.e.
CREATE NONCLUSTERED INDEX [IX_Orders_MyIndex] ON [Orders]
( [Company] ASC, [OrderNum] ASC, [Status] ASC )
INCLUDE ([ID],[ItemID],[BoxID],
[Column5],[Column6],[Column7],[Column8],[Column9],[Column10],etc)
That seems messy if I did it on 4 or 5 indexes.
Basically I have 4-5 queries which run quite often (some selects and updates) so I want to make it as efficient as possible.
All queries will use the [company] field, and at least 1 other. How should I go about it.
Any help appreciated :)

In your execution plan, you say that lookup takes 97% of the batch.
In this case it doesn't mean anything because an index seek is very fast and you didn't have that much operation to be done.
That lookup is actually the record you read based on the index you have specified.
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Non-Clustered index on ([Company],[OrderNum],[Status])
This index will be considered only if Company, OrderNum and Status appear in your where clause.
Concatenated indexes generates a key that would look like this 0000000000000 when you pass only company it creates an incomplete key that requires using wildcard for the other to values.
It would look a little like this : key like 'XXX%' this logic will require an index scan which is time consuming.
The optimizer will determine that it's preferable to first seek and rows from the ItemID index and then scan these to match any with the required company.
Is it worth adding [Company] to all my indexes since it seems to make 0 different to the execution plan?
You should consider having a Company index instead of adding it to all your indexes.
Composite index could speed things up by reducing the number of nested loops, but you have to think then thoroughly.
The order of the fields you add to such an index is very important, they should be ordered by uniqueness to allow a better seek. Also, you should never add a field that might not be used in a query.
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that mean every query will have 2 seeks?
Having more than one index seek is not all that bad, they are usually paralleled and only the result of both are matched together.
Is it worth 'including' all other columns in my indexes to avoid the key lookup? (making the index a tonne bigger, but potentially speeding it up?)
It is worth when it's only a few fields that could be optional in the where clause or when you have queries that select only those fields when you are using the specified index.
Last notes
All indexes are not equal, comparing string (varchar) is not the same as comparing numbers (integer, datetime, bytes, etc).
Also, keeping them clean helps a lot, if your indexes are fragmented, they will be next to useless in terms of performance gain.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas