Sql server Execution Plans not considering the non clustered index - sql

Below is query we have non clustered index on Questionnaire_Number column but optimizer choosing index scan and it its taking almost 39% of query operations.
SELECT
[Country],
[Site],
[Subject],
[Questionnaire],
[Ques],
[Res],
[ReceivedDt],
[CompletedDt],
[CompletedDtlc],
[Questionnaire],
[RecordUpdaChange],
[ChangeRequesiser],
[RecordUpdated],
[CompletedQuestionnaireId],
[EnrollmentDt],
[AppVersion],
[CompletedBY],
[ModeofEntry],
[StudyDay]
FROM [dbo].Report_PerformTest A
JOIN #Questionnaire_Number Q
ON A.Questionnaire_Number = Q.Questionnaire_Number
ORDER BY CountryName, Site, Subject, CompletedDateTimeLc
Execution Plan
enter image description here
Index Details :
CLUSTERED INDEX : CompletedQuestionnaireId;
CREATE NONCLUSTERED INDEX NIX_Dairy_data_Summary
ON [Report_performTest] SiteID,SubjectNo
INCLUDE All selected columns
CREATE NONCLUSTERED INDEX NIXQues_Dairy_data_Summary
ON [Report_performTest](Questionnaire_Number)
INCLUDE All selected columns
Could please let me know how to avoid index scan on this

Your indexes are not used because the ORDER BY clause specify some columns that are not a part of the "KEY" clause or the INCLUDE clause of your index, like CountryName.

Related

Why isn't SQL Server using my clustered index and doing a non-clustered index scan?

I have a patient table with a few columns, and a clustered index on column ID and a non-clustered index on column birth.
create clustered index CI_patient on dbo.patient (ID)
create nonclustered index NCI_patient on dbo.patient (birth)
Here are my queries:
select * from patient
select ID from patient
select birth from patient
Looking at the execution plan, the first query is 'clustered index scan' (which is understandable because the table is a clustered table), the third one is 'index scan nonclustered' (which is also understandable because this column has a nonclustered index)
My question is why the second one is 'index scan nonclustered'? This column suppose to have a clustered index, in this sense, should that be clustered index scan? Any thoughts on this?
Basically, your second query wants to get all ID values from the table (no WHERE clause or anything).
SQL Server can do this two ways:
clustered index scan - basically a full table scan to read all the data from all rows, and extract the ID from each row - would work, but it loads the WHOLE table, one by one
do a scan across the non-clustered index, because each non-clustered index also includes the clustering column(s) on its leaf level. Since this is a index that is much smaller than the full table, to do this, SQL Server will need to load fewer data pages and thus can provide the answer - all ID values from all rows - faster than when doing a full table scan (clustered index scan)
The cost-based optimizer in SQL Server just picks the more efficient route to get the answer to the question you've asked with your second query.

Execution plan showing missing non-clustered index on already partitioned clustered indexes

We have a query where the table is partitioned on column Adate.
Row count: 56595943, partition scheme - yearly, no of partitions - 300
Clustered index columns : empid, Adate
Query :
select top 1 Adate
from emp
where empid = 134556 and Adate <= {ts '7485-09-01 00:00:00.0'}
order by Adate desc
The actual execution plan returns a clustered index seek operation with 93% of the total query cost on clustered index key.
But why is the optimizer recommending a missing index with 92% of cost?
missing index details: Improve query cost:92%
create nonclustered index IDX_NC on dbo.emp([empid], [Adate])
The missing index has an improvement measure of 14755268, as per Microsoft the improvement measure baseline is 1,000,000
Why is this happening? Do you recommend to have a nonclustered index on already clustered index columns?
Well - consider this:
you do have the clustered index on (empid, adate)
the clustered index contains the whole data, e.g. the leaf level pages of the clustered index contain the whole data records (all the columns in your table)
If you are searching and the query uses the clustered index, it might still need to load much more data than is actually needed.... the whole record, as many times as your criteria is found.
If you have a non-clustered index on just (empid, Adate), and your query really only requires Adate (in its SELECT list of columns), then this index will be much smaller - it contains only those two columns (none of the overhead of all the other columns, which are not needed for your current query). So scanning this index, or loading these index pages, will load much less data compared to the clustered index.
From that point of view, yes, even having a nonclustered index on the same columns that make up your clustered index can be beneficial for certain query scenarios - that's probably what the SQL Server query optimizer picks up here.

Optimize the Clustered Index Scan into Clustered Index Seek

There is scenario, I have table with 40 columns and I have to select all data of a table (including all columns). I have created a clustered index on the table and its including Clustered Index Scan while fetching full data set from the table.
I know that without any filter or join key, SQL Server will choose Clustered Index Scan instead of Clustered Index Seek. But, I want to have optimize execution plan by optimizing Clustered Index Scan into Clustered Index Seek. Is there any solution to achieve this? Please share.
Below is the screenshot of the execution plan:
Something is not quite right in the question / request, because what you are asking for will perform badly. I suspect it comes from mis-understanding what a clustered index is.
The clustered index - which is perhaps better stated as a clustered table - is the table of data, its not separate to the table, it is the table. If the order of the data on the table is already based on ITEM ID then the scan is the most efficient access method for your query (especially given the select *) - you do not want to seek in this scenario at all - and I don't believe that it is your scenario due to the sort operator.
If the clustered table is ordered based on another field, then you would need an additional non-clustered index to provide the correct order. You would then try to force a plan which was a non-clustered index scan, nested loop to a clustered index seek. That can be achieved using query hints, most likely an INNER LOOP JOIN would cause the seek - but a FORCESEEK also exists which can be used.
Performance wise this second option is never going to win - you are in effect looking at a tipping point notion (https://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/)
Well, I was trying to achieve the same, I wanted an index seek instead of an index scan on my top query.
SELECT TOP 5 id FROM mytable
Here is the execution plan being shown for the query:
I even tried the Offset Fetch Next approach, the plan was same.
To avoid a index scan, I included a fake primary key filter like below:
SELECT TOP 5 id FROM mytable where id != 0
I know, I won't have a 0 value in my primary key, so I added it in top query, which was resolved to an index seek instead of index scan:
Even though, the query plan comparison gives operation cost as similar to other, for index seek and scan in this regard. But I think to achieve index seek this way, it is an extra operation for the db to perform because it has to compare whether the id is 0 or not. Which we entirely do not need it to do if we want the top few records.

How to create Index for this scenario in SQL Server?

What is the best Index to this Item table for this following query
select
tt.itemlookupcode,
tt.TotalQuantity,
tt.ExtendedPrice,
tt.ExtendedCost,
items.ExtendedDescription,
items.SubDescription1,
dept.Name,
categories.Name,
sup.Code,
sup.SupplierName
from
#temp_tt tt
left join HQMatajer.dbo.Item items
on items.ItemLookupCode=tt.itemlookupcode
left join HQMatajer.dbo.Department dept
ON dept.ID=items.DepartmentID
left join HQMatajer.dbo.Category categories
on categories.ID=items.CategoryID
left join HQMatajer.dbo.Supplier sup
ON sup.ID=items.SupplierID
drop table #temp_tt
I created Index like
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[DBTimeStamp] ASC,
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[Description],
[SubDescription1]
)
But in Execution plan when I check the index which picked another index. That index having only TimeStamp column.
What is the best index for this scenario to that particular table.
First column in index should be part of filtration else Index will not be used. In your index first column is DBTimeStamp and it is not filtered in your query. That is the reason your index is not used.
Also in covering index you have used [Description],[SubDescription1] but in query you have selected ExtendedDescription,items.SubDescription1 this will have additional overhead of key/Rid lookup
Try alerting your index like this
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[ExtendedDescription],
[SubDescription1]
)
Having said that all still optimizer go for scan or choose some other index based on data retrieved from Item table
I'm not surprised your index isn't used. DBTimeStamp is likely to be highly selective, and is not referenced in your query at all.
You might have forgotten to include an ORDER BY clause in your query which was intended reference DBTimeStamp. But even then your query would probably need to scan the entire index. So it may as well scan the actual table.
The only way to make that index 'look enticing' would be to ensure it includes all columns that are used/returned. I.e. You'd need to add ExtendedDescription. The reason this can help is that indexes typically require less storage than the full table. So it's faster to read from disk. But if you're missing columns (in your case ExtendedDescription), then the engine needs to perform an additional lookup onto the full table in any case.
I can't comment why the DBTimeStamp column is preferred - you haven't given enough detail. But perhaps it's the CLUSTERED index?
Your index would be almost certain to be used if defined as:
(
[ItemLookupCode] ASC --The only thing you're actually filtering by
)
INCLUDE (
/* Moving the rest to include is most efficient for the index tree.
And by including ALL used columns, there's no need to perform
extra lookups to the full table.
*/
[DepartmentID],
[CategoryID],
[SupplierID],
[ExtendedDescription],
[SubDescription1]
)
Note however, that this kind of indexing strategy 'Find the best for each query used' is unsustainable.
You're better off finding 'narrower' indexes that are appropriate multiple queries.
Every index slows down INSERT and UPDATE queries.
And indexes like this are impacted by more columns than the preferred 'narrower' indexes.
Index choice should focus on the selectivity of columns. I.e. Given a specific value or small range of values, what percentage of data is likely to be selected based on your queries?
In your case, I'd expect ItemLookupCode to be unique per item in the Items table. In other words indexing by that without any includes should be sufficient. However, since you're joining to a temp table that theoretically could include all item codes: in some cases it might be better to scan the CLUSTERED INDEX in any case.

On what basis indexes are selected

I have a question: I have a clustered index on orderid,productid and a non-clustered index on productid. When I am using the following query, it uses the nonclustered index seek on productid, which I expected:
select orderid, productid
from Sales.OrderDetails
where productid =1
order by productid
However, without changing the search arguments, I added the Quantity:
select orderid, productid, qty
from Sales.OrderDetails
where productid =1
order by productid
Now it used a clustered index scan; and when I force use non clustered index (productid) the performance drops.
Every nonclustered index will contain the clustering key in its leaf level nodes. So your nonclustered index on productid really contains productid and orderid in its leaf level nodes.
So your first query can be satisfied by just looking up the value in the nonclustered index - the leaf level node found will contain both columns that your SELECT requires.
This is NOT the case when you add another column, like qty - now, once found, a key lookup into the actual data page is necessary to get all the columns to be returned from your SELECT query. So therefore, maybe now a clustered index scan is performing better than a nonclustered index seek and a key lookup.
I'm pretty sure the second query would use the nonclustered index again once you include the qty column in your index:
CREATE NONCLUSTERED INDEX ix_productId
ON Sales.OrderDetails(productId)
INCLUDE (qty)
because again: now once the leaf-level page in the non-clustered index is found, all necessary columns are present on that page and can be returned to your second query.
It makes sense, when you added the qty column using the clustered index is ideal because the non index (data) columns are located near the clustered indexed attribs.
Index: "The right index will help optimizer finding the best execution plan"
Selectivity: Density of an Index = 1 divided by the no of distinct records.Lower the value of density, higher the chances for Index to be choosen by optimizer, there by higher SELECTIVITY. For every Index, there will be a statistics also, With the following DBCC command, we can find the selectivity of a Statistics and there by the Index.
DBCC SHOW_STATISTICS('table name','statistics name')
It also displays, the Density, of the column when included with Clustered index columns. Lower selectivity will make optimizer ignore the index. We may find low performance query, even after creating index on the search column. The reason could be low selectivity.
Eg:
SELECT sod.OrderQty ,
sod.SalesOrderID ,
sod.SalesOrderDetailID ,
sod.LineTotal
FROM Sales.SalesOrderDetail sod
WHERE sod.OrderQty = 10;
When a where condition is involved, the chances for a seek is high.But in the above query, the execution plan is showing a clustered index Scan. created a non clusterd index on OrderQty, still optimizer is ignoring the index. The reason is, Order quantity density is 1/41 distinct values = ~.25, With such a low selectivity, the optmizser finds it comparable with index scan itself.
Outdated STATISTICS could really make OPTIMIZER ignore the index.
There are different things that a DBA can do to help optimizer find the best execution plan. One amoung those are QUERY HINTS. Introducing Query hits could help, at the same time could be affect the performance, as well. An Index related Query hint is WITH(INDEX())
We can tell optimizer to use a specific index.
eg: SELECT * FROM table WITH (INDEX (0)) -- Can give index number
SELECT * FROM table WITH (INDEX (indexName)) --
These things may help You