What "Clustered Index Scan (Clustered)" means on SQL Server execution plan? - sql

I have a query that fails to execute with "Could not allocate a new page for database 'TEMPDB' because of insufficient disk space in filegroup 'DEFAULT'".
On the way of trouble shooting I am examining the execution plan. There are two costly steps labeled "Clustered Index Scan (Clustered)". I have a hard time find out what this means?
I would appreciate any explanations to "Clustered Index Scan (Clustered)" or suggestions on where to find the related document?

I would appreciate any explanations to "Clustered Index Scan
(Clustered)"
I will try to put in the easiest manner, for better understanding you need to understand both index seek and scan.
SO lets build the table
use tempdb GO
create table scanseek (id int , name varchar(50) default ('some random names') )
create clustered index IX_ID_scanseek on scanseek(ID)
declare #i int
SET #i = 0
while (#i <5000)
begin
insert into scanseek
select #i, 'Name' + convert( varchar(5) ,#i)
set #i =#i+1
END
An index seek is where SQL server uses the b-tree structure of the index to seek directly to matching records
you can check your table root and leaf nodes using the DMV below
-- check index level
SELECT
index_level
,record_count
,page_count
,avg_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID('tempdb'),OBJECT_ID('scanseek'),NULL,NULL,'DETAILED')
GO
Now here we have clustered index on column "ID"
lets look for some direct matching records
select * from scanseek where id =340
and look at the Execution plan
you've requested rows directly in the query that's why you got a clustered index SEEK .
Clustered index scan: When Sql server reads through for the Row(s) from top to bottom in the clustered index.
for example searching data in non key column. In our table NAME is non key column so if we will search some data in the name column we will see clustered index scan because all the rows are in clustered index leaf level.
Example
select * from scanseek where name = 'Name340'
please note: I made this answer short for better understanding only, if you have any question or suggestion please comment below.

Expanding on Gordon's answer in the comments, a clustered index scan is scanning one of the tables indexes to find the values you are doing a where clause filter, or for a join to the next table in your query plan.
Tables can have multiple indexes (one clustered and many non-clustered) and SQL Server will search the appropriate one based upon the filter or join being executed.
Clustered Indexes are explained pretty well on MSDN. The key difference between clustered and non-clustered is that the clustered index defines how rows are stored on disk.
If your clustered index is very expensive to search due to the number of records, you may want to add a non-clustered index on the table for fields that you search for often, such as date fields used for filtering ranges of records.

A clustered index is one in which the terminal (leaf) node of the index is the actual data page itself. There can be only one clustered index per table, because it specifies how records are arranged within the data page. It is generally (and with some exceptions) considered the most performant index type (primarily because there is one less level of indirection before you get to your actual data record).
A "clustered index scan" means that the SQL engine is traversing your clustered index in search for a particular value (or set of values). It is one of the most efficient methods for locating a record (beat by a "clustered index seek" in which the SQL Engine is looking to match a single selected value).
The error message has absolutely nothing to do with the query plan. It just means that you are out of space on TempDB.

I have been having issues with performance and timeouts due to a clustered index scan. However another seemingly identical database did not have the same issue.
Turns out the COMPATIBILITY_LEVEL flag on the db was different... the version with COMPATIBILITY_LEVEL 100 was using the scan, the db with level 130 wasn't. Performance difference is huge (from more than 1 minute to less that 1 second for same query)
ALTER DATABASE [mydb] SET COMPATIBILITY_LEVEL = 130

If you hover over the step in the query plan, SSMS displays a description of what the step does. That will give you a baseline understanding of "Clustered Index Scan (Clustered)" and all other steps involved.

Related

Adding specific index to SQL Server table to improve performance

I have a slow query on a table.
SELECT (some columns)
FROM Table
This table has an ID (integer, identity (1,1)) primary index which is the only index on this table.
The query has a WHERE clause:
WHERE Field05 <> 1
AND (Field01 LIKE '%something%' OR Field02 LIKE '%something%' OR
Field03 LIKE'%something%' OR Field04 LIKE'%something%')
Field05 is bit, not null
Field01 is NVarchar(255)
Field02 is NVarchar(255)
Field03 is Nchar(11)
Field04 is Varchar(50)
The execution plan shows a "Clustered index scan" resulting in a slow execution.
I tried adding indexes:
CREATE NONCLUSTERED INDEX IX_Aziende_RagSoc ON dbo.Aziende (Field01);
CREATE NONCLUSTERED INDEX IX_Aziende_Nome ON dbo.Aziende (Field02);
CREATE NONCLUSTERED INDEX IX_Aziende_PIVA ON dbo.Aziende (Field03);
CREATE NONCLUSTERED INDEX IX_Aziende_CodFisc ON dbo.Aziende (Field04);
CREATE NONCLUSTERED INDEX IX_Aziende_Eliminata ON dbo.Aziende (Field05);
Same performances, and again, the execution plan shows a "Clustered index scan"
I removed these 5 indexes and added only ONE index:
CREATE NONCLUSTERED INDEX IX_Aziende_Ricerca
ON Aziende (Field05)
INCLUDE (Field01, Field02, Field03, Field04)
Same performances, but in this situation the execution plan changes.
Is more complex but always slow.
I removed this index and added a different index:
CREATE NONCLUSTERED INDEX IX_Aziende_Ricerca
ON Aziende (Field05,Field01,Field02,Field03,Field04)
Same performances, in this situation the execution plan remains like in the previous situation.
The execution is always slow.
I have no other ideas ... someone can help?
This is too long for a comment.
First, you should use Field05 = 0 rather than Field05 <> 1. Equality is both easier to read and better for the optimizer. It won't make a difference in this particular case, unless you have a clustered index starting with Field05 or if almost all values are 1 (that is, the 0 is highly selective).
Second, in general, you can only optimize string pattern matching using a full text index. This in turn has other limitations, such as looking for words or prefixes (but not starting with wildcards).
The one exception is if "something" is a constant. In that case, you could add persisted computed columns with indexes to capture whether the value is present in these columns. However, I'm guessing that "something" is not constant.
That leaves you with full text indexes or with reconsidering your data model. Perhaps you are storing things in strings -- like lists of tags -- that should really be in a separate table.
Just to chime in with a few comments.
SQL Server tends to Table Scan Even if an index is present unless it thinks the Searched field Has a Cardinality of less than 1%. With this in mind there is never going to be any value in a index on a Bit field. (cardinality 50%!)
One option you might consider is to create a Filtered Index (WHERE Field05 = 0) Then you can include your other fields in this index.
Note this will only help you if you are not selecting any other columns from the table.
Can you check what proportion of your data has Field5=0 ?- If this is small (eg under 10%) then a filtered index might help.
I can't see any way that you can avoid a scan of some sort though - The best you can get is probably an Index scan.
Another option (essentially the same thing!) is to create a schema bound indexed view with all the columns you need and with the field5=0 filter hardcoded into the view.
Again - Unless you are certain that the Selected Column list is going to be a tiny proportion of the columns in the table then SQL will probably be faster with a table scan. If you were only ever selecting a handful of columns from a a very wide table then an index covering these columns might help as even though it will still be a scan - there will be more rows per page than scanning the full table.
So in summary - If you can guarantee a small subset of the table cols will be selected
AND field5 = 0 represents a minority of your rows in the table then a filtered index with Includes can be of value.
EG
CREATE NONCLUSTERED INDEX ix ON dbo.Aziende(ID) INCLUDE (Field01,Field02,Field03,Field04, [other cols used by select]) WHERE (field5=0)
Good Luck!
After a lot of fight I forgot the idea of adding an index.
Nothing changes with index.
I changed the C# code that builds the query, and now I try to understand the meaning of the "something" parameter received from the function.
If it is of type 1, then I build a WHERE on Field01
If it is of type 2, then I build a WHERE on Field02
If it is of type 3, then I build a WHERE on Field03
If it is of type 4, then I build a WHERE on Field04
This way, execution times becomes 1/4 of before.
Curstomers are satisfied.

Optimize the Clustered Index Scan into Clustered Index Seek

There is scenario, I have table with 40 columns and I have to select all data of a table (including all columns). I have created a clustered index on the table and its including Clustered Index Scan while fetching full data set from the table.
I know that without any filter or join key, SQL Server will choose Clustered Index Scan instead of Clustered Index Seek. But, I want to have optimize execution plan by optimizing Clustered Index Scan into Clustered Index Seek. Is there any solution to achieve this? Please share.
Below is the screenshot of the execution plan:
Something is not quite right in the question / request, because what you are asking for will perform badly. I suspect it comes from mis-understanding what a clustered index is.
The clustered index - which is perhaps better stated as a clustered table - is the table of data, its not separate to the table, it is the table. If the order of the data on the table is already based on ITEM ID then the scan is the most efficient access method for your query (especially given the select *) - you do not want to seek in this scenario at all - and I don't believe that it is your scenario due to the sort operator.
If the clustered table is ordered based on another field, then you would need an additional non-clustered index to provide the correct order. You would then try to force a plan which was a non-clustered index scan, nested loop to a clustered index seek. That can be achieved using query hints, most likely an INNER LOOP JOIN would cause the seek - but a FORCESEEK also exists which can be used.
Performance wise this second option is never going to win - you are in effect looking at a tipping point notion (https://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/)
Well, I was trying to achieve the same, I wanted an index seek instead of an index scan on my top query.
SELECT TOP 5 id FROM mytable
Here is the execution plan being shown for the query:
I even tried the Offset Fetch Next approach, the plan was same.
To avoid a index scan, I included a fake primary key filter like below:
SELECT TOP 5 id FROM mytable where id != 0
I know, I won't have a 0 value in my primary key, so I added it in top query, which was resolved to an index seek instead of index scan:
Even though, the query plan comparison gives operation cost as similar to other, for index seek and scan in this regard. But I think to achieve index seek this way, it is an extra operation for the db to perform because it has to compare whether the id is 0 or not. Which we entirely do not need it to do if we want the top few records.

SQL Server non-clustered index

I have two different queries in SQL Server and I want to clarify
how the execution plan would be different, and
which of them is more efficient
Queries:
SELECT *
FROM table_name
WHERE column < 2
and
SELECT column
FROM table_name
WHERE column < 2
I have a non-clustered index on column.
I used to use Postgresql and I am not familiar with SQL Server and these kind of indexes.
As I read many questions here I kept two notes:
When I have a non-clustered index, I need one more step in order to have access to data
With a non-clustered index I could have a copy of part of the table and I get a quicker response time.
So, I got confused.
One more question is that when I have "SELECT *" which is the influence of a non-clustered index?
1st query :
Depending on the size of the data you might face lookup issues such as Key lookup and RID lookups .
2nd query :
It will be faster because it will not fetch columns that are not part of the index , though i recommend using covering index ..
I recommend you check this blog post
The first select will use the non-clustered index to find the clustering key [clustered index exists] or page and slot [no clustered index]. Then that will be used to get the row. The query plan will be different depending on your STATS (the data).
The second query is "covered" by the non-clustered index. What that means is that the non-clustered index contains all of the data that you are selecting. The clustering key is not needed, and the clustered index and/or heap is not needed to provide data to the select list.

Why NonClustered index scan faster than Clustered Index scan?

As I know, heap tables are tables without clustered index and has no physical order.
I have a heap table "scan" with 120k rows and I am using this select:
SELECT id FROM scan
If I create a non-clustered index for the column "id", I get 223 physical reads.
If I remove the non-clustered index and alter the table to make "id" my primary key (and so my clustered index), I get 515 physical reads.
If the clustered index table is something like this picture:
Why Clustered Index Scans workw like the table scan? (or worse in case of retrieving all rows). Why it is not using the "clustered index table" that has less blocks and already has the ID that I need?
SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page. A clustered index is different: its leaf nodes are the data page itself and the clustered index's b-tree becomes the backing store for the table itself; the heap ceases to exist for the table.
Your non-clustered index contains a single, presumably integer column. It's a small, compact index to start with. Your query select id from scan has a covering index: the query can be satisfied just by examining the index, which is what is happening. If, however, your query included columns not in the index, assuming the optimizer elected to use the non-clustered index, an additional lookup would be required to fetch the data pages required, either from the clustering index or from the heap.
To understand what's going on, you need to examine the execution plan selected by the optimizer:
See Displaying Graphical Execution Plans
See Red Gate's SQL Server Execution Plans, by Grant Fritchey
A clustered index generally is about as big as the same data in a heap would be (assuming the same page fullness). It should use just a little more reads than a heap would use because of additional B-tree levels.
A CI cannot be smaller than a heap would be. I don't see why you would think that. Most of the size of a partition (be it a heap or a tree) is in the data.
Note, that less physical reads does not necessarily translate to a query being faster. Random IO can be 100x slower than sequential IO.
When to use Clustered Index-
Query Considerations:
1) Return a range of values by using operators such as BETWEEN, >, >=, <, and <= 2) Return large result sets
3) Use JOIN clauses; typically these are foreign key columns
4) Use ORDER BY, or GROUP BY clauses. An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.
Column Considerations :
Consider columns that have one or more of the following attributes:
1) Are unique or contain many distinct values
2) Defined as IDENTITY because the column is guaranteed to be unique within the table
3) Used frequently to sort the data retrieved from a table
Clustered indexes are not a good choice for the following attributes:
1) Columns that undergo frequent changes
2) Wide keys
When to use Nonclustered Index-
Query Considerations:
1) Use JOIN or GROUP BY clauses. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
2) Queries that do not return large result sets
3) Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches
Column Considerations :
Consider columns that have one or more of the following attributes:
1) Cover the query. For more information, see Index with Included Columns
2) Lots of distinct values, such as a combination of last name and first name, if a clustered index is used for other columns
3) Used frequently to sort the data retrieved from a table
Database Considerations:
1) Databases or tables with low update requirements, but large volumes of data can benefit from many nonclustered indexes to improve query performance.
2) Online Transaction Processing applications and databases that contain heavily updated tables should avoid over-indexing. Additionally, indexes should be narrow, that is, with as few columns as possible.
Try running
DBCC DROPCLEANBUFFERS
Before the queries...
If you really want to compare them.
Physical reads don't mean the same as logical reads when optimizing a query

Seek & Scan in SQL Server

After googling i came to know that Index seek is better than scan.
How can I write the query that will yield to seek instead of scan. I am trying to find this in google but as of now no luck.
Any simple example with explanation will be appreciated.
Thanks
Search by the primary key column(s)
Search by column(s) with index(es) on them
An index is a data structure that improves the speed of data retrieval operations on a database table. Most dbs automatically create an index when a primary key is defined for a table. SQL Server creates an index for primary key (composite or otherwise) as a "clustered index", but it doesn't have to be the primary key - it can be other columns.
NOTE:
LIKE '%'+ criteria +'%' will not use an index; LIKE criteria +'%' will
Related reading:
SQL SERVER – Index Seek vs. Index Scan
Index
Which is better: Bookmark/Key Lookup or Index Scan
Extending rexem's feedback:
The clustered index idea for pkeys isn't arbitrary. It's simply a default to make the pkey clustered. And clustered means that values will be physically placed near each other on a Sql Server 8k page thus assuming that if you fetch one value by pkey, you will probably be interested in its neighbors. i don't think it's a good idea to do that for pkeys since they're usually unique but arbitrary identifiers. Better to cluster on more useful data. One clustered index per table btw.
In a nutshell: If you can filter your query on a clustered index column (that makes sense) then all the better.
An index seek is when SQL Server can use a binary search to quickly find the row. The rows in an index are sorted in a particular order, and your query has to specify enough information in the WHERE clause to allow SQL Server to make use of the sorted index.
An index scan is when SQL Server cannot use the sort order of the index, but can still use the index itself. This makes sense if the table rows are very large, but the index is relatively small. SQL Server will only have to read the smaller index from disk.
As a simple example, take a phonebook table:
id int identity primary key
lastname varchar(50)
phonenumber varchar(15)
Say that there is an index on (lastname). Then this query will result in an index seek:
select * from phonebook where lastname = 'leno'
This query will result in an index scan:
select * from phonebook where lastname like '%no'
The analogy with a real life phonebook is that you can't look up people whose name ends in 'no'. You have to browse the entire phonebook.