Full table scan occured even when index exists? - sql

We have a sql query as follows
select * from Table where date < '20091010'
However when we look at the query plan, we see
The type of query is SELECT.
FROM TABLE
Worktable1.
Nested iteration.
Table Scan.
Forward scan.
Positioning at start of table.
Using I/O Size 32 Kbytes for data pages.
With MRU Buffer Replacement Strategy for data pages.
which seems to suggest that a full table scan is done. Why is the index not used?

If the majority of your dates are found by applying < '20091010' then the index may well be overlooked in favour of a table scan. What is your distribution of dates within that table? What is the cardinality? Is the index used if you only select date rather than select *?

Unless the index is covering *, the optimizer realizes that a table scan is probably more efficient than an index seek/scan and bookmark lookup. What's the expected selectivity of the date range? Do you have a primary key defined?

Related

In what case SQL will use an Index of the table

During an interview I was asked a weird question. I could not find the correct answer for it, so posting the question below:
I have an index on a column Stud_Name. I am searching for name using a wild card. My query is
a) select * from Stud_Details where Stud_Name like 'A%'
b) select * from Stud_Details where Stud_Name like '%A'.
c) select * from Stud_Details where Stud_Name not like 'A%'
In which case would the SQL server use the Index, that I have created on Stud_Name?
PS: If this question seems idiotic don't get mad on me, get mad on the interviewer who asked this to me.
Also I don't have any info regarding how the index was created. This info above is all I have.
In what cases can SQL Server use an Index on Stud_Name?
Option (a) is the only one that can be used in an index seek. like 'A%' can get converted to a range seek on >= A and <B
Option (b) can't use an index seek as the leading wildcard prevents this. It could still scan an index though.
Option (c) could in theory be converted to two range seeks (< 'A' OR >= 'B' but I've just checked and SQL Server does not do that (even in cases where this would eliminate 100% of the table and with a FORCESEEK hint). Again it can scan an index though.
In what cases will SQL Server use an Index on Stud_Name?
This depends on cardinality estimates and whether the index is covering or not and the relative width of the index rows vs the base table rows.
Assuming the index is not covering then any rows found that match the WHERE clause will need lookups to retrieve the column values. The greater the number of estimated lookups the less likely the non covering index is to be used.
For b+c the choice is index scan + lookups vs table scan with no lookups. The favourability of doing an index scan will be higher if the index is much narrower than the table. If they are similar sizes there is not much IO benefit from reading the index rather than the table in the first place.

How to choose index scan or table scan when query in SequoiaDB?

There are the following scenarios:
Use PG to execute the query as follows:
Select count(*) from t where DATETIME >'2018-07-27 10.12.12.000000' and DATETIME < '2018-07-28 10.12.12.000000'
It returns 22 indexes with rapid execution.
The query condition has "="
Select count(*) from t where DATETIME >='2018-07-27 10.12.12.000000' and DATETIME <= '2018-07-28 10.12.12.000000'
It return 22 indexes which cost 20s.
I find that the query without “=” choose index scan, however, the query with “=” partly choose table scan.
According to your question:
The current indexing mechanism is that the optimizer matches the first available index, which means that the query will first select the first index created, and the choice of index depends on the order in which the index is created. In the case of an index, the query will take the index scan first.
Make sure that the nodes on each data group contain the index, otherwise the unindexed data nodes will take the table scan.
Execute analyze optimization query. Analyze is a new feature of SequoiaDB v3.0. It is mainly used to analyze collections, index data, and collect statistical information, and provide an optimal query algorithm to determine either index or table scan. Analyze specific usage reference: http://doc.sequoiadb.com/cn/index-cat_id-1496923440-edition_id-300
View the access plan by find.explain() to view the query cost

Optimize the Clustered Index Scan into Clustered Index Seek

There is scenario, I have table with 40 columns and I have to select all data of a table (including all columns). I have created a clustered index on the table and its including Clustered Index Scan while fetching full data set from the table.
I know that without any filter or join key, SQL Server will choose Clustered Index Scan instead of Clustered Index Seek. But, I want to have optimize execution plan by optimizing Clustered Index Scan into Clustered Index Seek. Is there any solution to achieve this? Please share.
Below is the screenshot of the execution plan:
Something is not quite right in the question / request, because what you are asking for will perform badly. I suspect it comes from mis-understanding what a clustered index is.
The clustered index - which is perhaps better stated as a clustered table - is the table of data, its not separate to the table, it is the table. If the order of the data on the table is already based on ITEM ID then the scan is the most efficient access method for your query (especially given the select *) - you do not want to seek in this scenario at all - and I don't believe that it is your scenario due to the sort operator.
If the clustered table is ordered based on another field, then you would need an additional non-clustered index to provide the correct order. You would then try to force a plan which was a non-clustered index scan, nested loop to a clustered index seek. That can be achieved using query hints, most likely an INNER LOOP JOIN would cause the seek - but a FORCESEEK also exists which can be used.
Performance wise this second option is never going to win - you are in effect looking at a tipping point notion (https://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/)
Well, I was trying to achieve the same, I wanted an index seek instead of an index scan on my top query.
SELECT TOP 5 id FROM mytable
Here is the execution plan being shown for the query:
I even tried the Offset Fetch Next approach, the plan was same.
To avoid a index scan, I included a fake primary key filter like below:
SELECT TOP 5 id FROM mytable where id != 0
I know, I won't have a 0 value in my primary key, so I added it in top query, which was resolved to an index seek instead of index scan:
Even though, the query plan comparison gives operation cost as similar to other, for index seek and scan in this regard. But I think to achieve index seek this way, it is an extra operation for the db to perform because it has to compare whether the id is 0 or not. Which we entirely do not need it to do if we want the top few records.

What "Clustered Index Scan (Clustered)" means on SQL Server execution plan?

I have a query that fails to execute with "Could not allocate a new page for database 'TEMPDB' because of insufficient disk space in filegroup 'DEFAULT'".
On the way of trouble shooting I am examining the execution plan. There are two costly steps labeled "Clustered Index Scan (Clustered)". I have a hard time find out what this means?
I would appreciate any explanations to "Clustered Index Scan (Clustered)" or suggestions on where to find the related document?
I would appreciate any explanations to "Clustered Index Scan
(Clustered)"
I will try to put in the easiest manner, for better understanding you need to understand both index seek and scan.
SO lets build the table
use tempdb GO
create table scanseek (id int , name varchar(50) default ('some random names') )
create clustered index IX_ID_scanseek on scanseek(ID)
declare #i int
SET #i = 0
while (#i <5000)
begin
insert into scanseek
select #i, 'Name' + convert( varchar(5) ,#i)
set #i =#i+1
END
An index seek is where SQL server uses the b-tree structure of the index to seek directly to matching records
you can check your table root and leaf nodes using the DMV below
-- check index level
SELECT
index_level
,record_count
,page_count
,avg_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID('tempdb'),OBJECT_ID('scanseek'),NULL,NULL,'DETAILED')
GO
Now here we have clustered index on column "ID"
lets look for some direct matching records
select * from scanseek where id =340
and look at the Execution plan
you've requested rows directly in the query that's why you got a clustered index SEEK .
Clustered index scan: When Sql server reads through for the Row(s) from top to bottom in the clustered index.
for example searching data in non key column. In our table NAME is non key column so if we will search some data in the name column we will see clustered index scan because all the rows are in clustered index leaf level.
Example
select * from scanseek where name = 'Name340'
please note: I made this answer short for better understanding only, if you have any question or suggestion please comment below.
Expanding on Gordon's answer in the comments, a clustered index scan is scanning one of the tables indexes to find the values you are doing a where clause filter, or for a join to the next table in your query plan.
Tables can have multiple indexes (one clustered and many non-clustered) and SQL Server will search the appropriate one based upon the filter or join being executed.
Clustered Indexes are explained pretty well on MSDN. The key difference between clustered and non-clustered is that the clustered index defines how rows are stored on disk.
If your clustered index is very expensive to search due to the number of records, you may want to add a non-clustered index on the table for fields that you search for often, such as date fields used for filtering ranges of records.
A clustered index is one in which the terminal (leaf) node of the index is the actual data page itself. There can be only one clustered index per table, because it specifies how records are arranged within the data page. It is generally (and with some exceptions) considered the most performant index type (primarily because there is one less level of indirection before you get to your actual data record).
A "clustered index scan" means that the SQL engine is traversing your clustered index in search for a particular value (or set of values). It is one of the most efficient methods for locating a record (beat by a "clustered index seek" in which the SQL Engine is looking to match a single selected value).
The error message has absolutely nothing to do with the query plan. It just means that you are out of space on TempDB.
I have been having issues with performance and timeouts due to a clustered index scan. However another seemingly identical database did not have the same issue.
Turns out the COMPATIBILITY_LEVEL flag on the db was different... the version with COMPATIBILITY_LEVEL 100 was using the scan, the db with level 130 wasn't. Performance difference is huge (from more than 1 minute to less that 1 second for same query)
ALTER DATABASE [mydb] SET COMPATIBILITY_LEVEL = 130
If you hover over the step in the query plan, SSMS displays a description of what the step does. That will give you a baseline understanding of "Clustered Index Scan (Clustered)" and all other steps involved.

Why am I getting a Clustered Index Scan when the column is indexed?

So, we have a table, InventoryListItems, that has several columns. Because we're going to be looking for rows at times based on a particlar column (g_list_id, a foreign key), we have that foreign key column placed into a non-clustered index we'll call MYINDEX.
So when I search for data like this:
-- fake data for example
DECLARE #ListId uniqueidentifier
SELECT #ListId = '7BCD0E9F-28D9-4F40-BD67-803005179B04'
SELECT *
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
I expected that it would use the MYINDEX index to find just the needed rows, and then look up the information in those rows. So not as good as just finding everything we need in the index itself, but still a big win over doing a full scan of the table.
But instead it seems that I'm still getting a clustered index scan. I can't figure out why that would happen.
If I do something like SELECTing only the values in the included columns of the index, it does what I would expect, an index seek, and just pulls everything from the index.
But if I SELECT *, why does it just bail on the index and do a scan when it seems like it would still benefit greatly from using it because it's referenced in the WHERE clause?
Since you're doing a SELECT * and thus you retrieve all columns, SQL Server's query optimizer may have decided it's easier and more efficient to just do a clustered index scan - since it needs to go to the clustered index leaf level to get all the columns anyway (and doing a seek first, and then a key lookup to actually get the whole data page, is quite an expensive operation - scan might just be more efficient in this setup).
I'm almost sure if you try
SELECT g_list_id
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
then there will be an index seek instead (since you're only retrieving a single column - not everything).
That's one of the reasons why I would recommend to be extra careful when using SELECT * .... - try to avoid it if ever possible.