Execution plan showing missing non-clustered index on already partitioned clustered indexes - indexing

We have a query where the table is partitioned on column Adate.
Row count: 56595943, partition scheme - yearly, no of partitions - 300
Clustered index columns : empid, Adate
Query :
select top 1 Adate
from emp
where empid = 134556 and Adate <= {ts '7485-09-01 00:00:00.0'}
order by Adate desc
The actual execution plan returns a clustered index seek operation with 93% of the total query cost on clustered index key.
But why is the optimizer recommending a missing index with 92% of cost?
missing index details: Improve query cost:92%
create nonclustered index IDX_NC on dbo.emp([empid], [Adate])
The missing index has an improvement measure of 14755268, as per Microsoft the improvement measure baseline is 1,000,000
Why is this happening? Do you recommend to have a nonclustered index on already clustered index columns?

Well - consider this:
you do have the clustered index on (empid, adate)
the clustered index contains the whole data, e.g. the leaf level pages of the clustered index contain the whole data records (all the columns in your table)
If you are searching and the query uses the clustered index, it might still need to load much more data than is actually needed.... the whole record, as many times as your criteria is found.
If you have a non-clustered index on just (empid, Adate), and your query really only requires Adate (in its SELECT list of columns), then this index will be much smaller - it contains only those two columns (none of the overhead of all the other columns, which are not needed for your current query). So scanning this index, or loading these index pages, will load much less data compared to the clustered index.
From that point of view, yes, even having a nonclustered index on the same columns that make up your clustered index can be beneficial for certain query scenarios - that's probably what the SQL Server query optimizer picks up here.

Related

Why isn't SQL Server using my clustered index and doing a non-clustered index scan?

I have a patient table with a few columns, and a clustered index on column ID and a non-clustered index on column birth.
create clustered index CI_patient on dbo.patient (ID)
create nonclustered index NCI_patient on dbo.patient (birth)
Here are my queries:
select * from patient
select ID from patient
select birth from patient
Looking at the execution plan, the first query is 'clustered index scan' (which is understandable because the table is a clustered table), the third one is 'index scan nonclustered' (which is also understandable because this column has a nonclustered index)
My question is why the second one is 'index scan nonclustered'? This column suppose to have a clustered index, in this sense, should that be clustered index scan? Any thoughts on this?
Basically, your second query wants to get all ID values from the table (no WHERE clause or anything).
SQL Server can do this two ways:
clustered index scan - basically a full table scan to read all the data from all rows, and extract the ID from each row - would work, but it loads the WHOLE table, one by one
do a scan across the non-clustered index, because each non-clustered index also includes the clustering column(s) on its leaf level. Since this is a index that is much smaller than the full table, to do this, SQL Server will need to load fewer data pages and thus can provide the answer - all ID values from all rows - faster than when doing a full table scan (clustered index scan)
The cost-based optimizer in SQL Server just picks the more efficient route to get the answer to the question you've asked with your second query.

Join large table to small in SQL Server with clustered columnstore index takes too long

I am experiencing very slow performance when trying to join 2 tables: one has 39M rows, the other 10k (35 sec). This runs on Azure SQL Premium instance, which is very decent server
select m39.*
from [Table_With_39M_Rows] m39
inner join [Table_With_10K_Rows] k10 on m39.[Id] = k10.[Id]
even a count(*) takes around 10 seconds
select count(*)
from [Table_With_39M_Rows] m39
inner join [Table_With_10K_Rows] k10 on m39.[Id] = k10.[Id]
Here are the table details:
Table [Table_With_39M_Rows] has around 39 million rows (50 columns) with a clustered columnstore index:
CREATE CLUSTERED COLUMNSTORE INDEX CCI_Table_With_39M_Rows
ON Table_With_39M_Rows
CREATE UNIQUE NONCLUSTERED UNCI_Table_With_39M_Rows_Id (Id ASC)
Table [Table_With_10K_Rows] has around 10k rows (50 columns) and Id as the primary key
ALTER TABLE Table_With_10K_Rows
ADD CONSTRAINT PK_Table_With_10K_Rows
PRIMARY KEY CLUSTERED([Id] ASC)
Clustered ColumnsStore index scan takes 99% and slows everything down.
How can I optimize this particular join? What indexing strategy should I employ?
Clustered column store indexes are helpful if row group elimination works(you can think of this skipping entire segment of rows which don't satisfy predicate) and if queries are analytical in nature.
To check whether segment elimination is occurring, you can use below queries
Below is a sample demo for a query i have(since we don't have your test data) which may help you understand more
query:
select s.* from sales s
join
numbers n
on n.number=s.id
Numbers table only has 65356 rows and sales table has more than 3 million rows.Each segment can have only one million rows.If you can observe the output of statistics IO,SQLSERVER reads 2 segments(2 million rows) and 2 segments are skipped,which is not great and i expect only one segment to be read and remaining three segments to be skipped..But 2 are read as shown below
Table 'sales'. Segment reads 2, segment skipped 2.
This is happening because you might have created clustered column store from a heap ,so try doing below
drop your existsing clustered column store index,in my case it is
drop index nci on sales
now try creating clustered index first and clustered column store next,this helps sqlserver in inserting the rows in order into clustered column store index.. you might also want to use maxdop 1 to avoid parallelism and unordered rows
create clustered index nci on sales(id)
create clustered columnstore index nci on sales
with (drop_existing=on,maxdop =1)
if you run the query now, you can see segement elimiation occurs and query is fast
Table 'sales'. Segment reads 1, segment skipped 2.
References and further reading:
https://www.sqlpassion.at/archive/2017/01/30/columnstore-segment-elimination/
https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/07 /17/columnstore-index-how-do-they-defer-from-traditional-btree-indices-on-rowstore-tables/
https://blogs.msdn.microsoft.com/sql_server_team/columnstore-index-performance-rowgroup-elimination/
I suggest you be consistent on use of [].
ID for a foreign key is not a good name.
Columnstore Indexes Described
Columnstore indexes give high performance gains for queries that use
full table scans, and are not well-suited for queries that seek into
the data, searching for a particular value.
Just because you need columnstore for other purposes does not make it a good applications for this.
Try regular nonclustered index on [Table_With_39M_Rows].[ID]

Index Decreases Number of Rows Read; No performance Gain

I created a non-clustered, non-unique index on a column (date) on a large table (16 million rows), but am getting very similar query speeds when compared to the exact same query that's being forced to not use any indexes.
Query 1 (uses index):
SELECT *
FROM testtable
WHERE date BETWEEN '01/01/2017' AND '03/01/2017'
ORDER BY date
Query 2 (no index):
SELECT *
FROM testtable WITH(INDEX(0))
WHERE date BETWEEN '01/01/2017' AND '03/01/2017'
ORDER BY date
Both queries take the same amount of time to run, and return the same result. When looking at the Execution plan for each, Query 1's number of rows read is
~ 4 million rows, where as Query 2 is reading 106 million rows. It appears that the index is working, but I'm not gaining any performance benefits from it.
Any ideas as to why this is, or how to increase my query speed in this case would be much appreciated.
Create Indexes with Included Columns: Cover index
This topic describes how to add included (or nonkey) columns to extend the functionality of nonclustered indexes in SQL Server by using SQL Server Management Studio or Transact-SQL. By including nonkey columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:
They can be data types not allowed as index key columns.
They are not considered by the Database Engine when calculating the
number of index key columns or index key size.
An index with nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.
CREATE NONCLUSTERED INDEX IX_your_index_name
ON testtable (date)
INCLUDE (col1,col2,col3);
GO
You need to build an index around the need of your query - this quick and free video course should bring you up to speed really quick.
https://www.brentozar.com/archive/2016/10/think-like-engine-class-now-free-open-source/

On what basis indexes are selected

I have a question: I have a clustered index on orderid,productid and a non-clustered index on productid. When I am using the following query, it uses the nonclustered index seek on productid, which I expected:
select orderid, productid
from Sales.OrderDetails
where productid =1
order by productid
However, without changing the search arguments, I added the Quantity:
select orderid, productid, qty
from Sales.OrderDetails
where productid =1
order by productid
Now it used a clustered index scan; and when I force use non clustered index (productid) the performance drops.
Every nonclustered index will contain the clustering key in its leaf level nodes. So your nonclustered index on productid really contains productid and orderid in its leaf level nodes.
So your first query can be satisfied by just looking up the value in the nonclustered index - the leaf level node found will contain both columns that your SELECT requires.
This is NOT the case when you add another column, like qty - now, once found, a key lookup into the actual data page is necessary to get all the columns to be returned from your SELECT query. So therefore, maybe now a clustered index scan is performing better than a nonclustered index seek and a key lookup.
I'm pretty sure the second query would use the nonclustered index again once you include the qty column in your index:
CREATE NONCLUSTERED INDEX ix_productId
ON Sales.OrderDetails(productId)
INCLUDE (qty)
because again: now once the leaf-level page in the non-clustered index is found, all necessary columns are present on that page and can be returned to your second query.
It makes sense, when you added the qty column using the clustered index is ideal because the non index (data) columns are located near the clustered indexed attribs.
Index: "The right index will help optimizer finding the best execution plan"
Selectivity: Density of an Index = 1 divided by the no of distinct records.Lower the value of density, higher the chances for Index to be choosen by optimizer, there by higher SELECTIVITY. For every Index, there will be a statistics also, With the following DBCC command, we can find the selectivity of a Statistics and there by the Index.
DBCC SHOW_STATISTICS('table name','statistics name')
It also displays, the Density, of the column when included with Clustered index columns. Lower selectivity will make optimizer ignore the index. We may find low performance query, even after creating index on the search column. The reason could be low selectivity.
Eg:
SELECT sod.OrderQty ,
sod.SalesOrderID ,
sod.SalesOrderDetailID ,
sod.LineTotal
FROM Sales.SalesOrderDetail sod
WHERE sod.OrderQty = 10;
When a where condition is involved, the chances for a seek is high.But in the above query, the execution plan is showing a clustered index Scan. created a non clusterd index on OrderQty, still optimizer is ignoring the index. The reason is, Order quantity density is 1/41 distinct values = ~.25, With such a low selectivity, the optmizser finds it comparable with index scan itself.
Outdated STATISTICS could really make OPTIMIZER ignore the index.
There are different things that a DBA can do to help optimizer find the best execution plan. One amoung those are QUERY HINTS. Introducing Query hits could help, at the same time could be affect the performance, as well. An Index related Query hint is WITH(INDEX())
We can tell optimizer to use a specific index.
eg: SELECT * FROM table WITH (INDEX (0)) -- Can give index number
SELECT * FROM table WITH (INDEX (indexName)) --
These things may help You

What's the difference between a Table Scan and a Clustered Index Scan?

Since both a Table Scan and a Clustered Index Scan essentially scan all records in the table, why is a Clustered Index Scan supposedly better?
As an example - what's the performance difference between the following when there are many records?:
declare #temp table(
SomeColumn varchar(50)
)
insert into #temp
select 'SomeVal'
select * from #temp
-----------------------------
declare #temp table(
RowID int not null identity(1,1) primary key,
SomeColumn varchar(50)
)
insert into #temp
select 'SomeVal'
select * from #temp
In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.
A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT, UPDATE, and DELETE. A heap table, however, requires a second write to the IAM.
If your query has a RANGE operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.
And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.
So:
For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.
For a query with a WHERE clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.
For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.
For INSERT, UPDATE, and DELETE a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.
Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):
INSERT performance: clustered index wins by about 3% due to the second write needed for a heap.
UPDATE performance: clustered index wins by about 8% due to the second lookup needed for a heap.
DELETE performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.
single SELECT performance: clustered index wins by about 16% due to the second lookup needed for a heap.
range SELECT performance: clustered index wins by about 29% due to the random ordering for a heap.
concurrent INSERT: heap table wins by 30% under load due to page splits for the clustered index.
http://msdn.microsoft.com/en-us/library/aa216840(SQL.80).aspx
The Clustered Index Scan logical and physical operator scans the clustered index specified in the Argument column. When an optional WHERE:() predicate is present, only those rows that satisfy the predicate are returned. If the Argument column contains the ORDERED clause, the query processor has requested that the rows' output be returned in the order in which the clustered index has sorted them. If the ORDERED clause is not present, the storage engine will scan the index in the optimal way (not guaranteeing the output to be sorted).
http://msdn.microsoft.com/en-us/library/aa178416(SQL.80).aspx
The Table Scan logical and physical operator retrieves all rows from the table specified in the Argument column. If a WHERE:() predicate appears in the Argument column, only those rows that satisfy the predicate are returned.
A table scan has to examine every single row of the table. The clustered index scan only needs to scan the index. It doesn't scan every record in the table. That's the point, really, of indices.