SQL Server non-clustered index - sql

I have two different queries in SQL Server and I want to clarify
how the execution plan would be different, and
which of them is more efficient
Queries:
SELECT *
FROM table_name
WHERE column < 2
and
SELECT column
FROM table_name
WHERE column < 2
I have a non-clustered index on column.
I used to use Postgresql and I am not familiar with SQL Server and these kind of indexes.
As I read many questions here I kept two notes:
When I have a non-clustered index, I need one more step in order to have access to data
With a non-clustered index I could have a copy of part of the table and I get a quicker response time.
So, I got confused.
One more question is that when I have "SELECT *" which is the influence of a non-clustered index?

1st query :
Depending on the size of the data you might face lookup issues such as Key lookup and RID lookups .
2nd query :
It will be faster because it will not fetch columns that are not part of the index , though i recommend using covering index ..
I recommend you check this blog post

The first select will use the non-clustered index to find the clustering key [clustered index exists] or page and slot [no clustered index]. Then that will be used to get the row. The query plan will be different depending on your STATS (the data).
The second query is "covered" by the non-clustered index. What that means is that the non-clustered index contains all of the data that you are selecting. The clustering key is not needed, and the clustered index and/or heap is not needed to provide data to the select list.

Related

What "Clustered Index Scan (Clustered)" means on SQL Server execution plan?

I have a query that fails to execute with "Could not allocate a new page for database 'TEMPDB' because of insufficient disk space in filegroup 'DEFAULT'".
On the way of trouble shooting I am examining the execution plan. There are two costly steps labeled "Clustered Index Scan (Clustered)". I have a hard time find out what this means?
I would appreciate any explanations to "Clustered Index Scan (Clustered)" or suggestions on where to find the related document?
I would appreciate any explanations to "Clustered Index Scan
(Clustered)"
I will try to put in the easiest manner, for better understanding you need to understand both index seek and scan.
SO lets build the table
use tempdb GO
create table scanseek (id int , name varchar(50) default ('some random names') )
create clustered index IX_ID_scanseek on scanseek(ID)
declare #i int
SET #i = 0
while (#i <5000)
begin
insert into scanseek
select #i, 'Name' + convert( varchar(5) ,#i)
set #i =#i+1
END
An index seek is where SQL server uses the b-tree structure of the index to seek directly to matching records
you can check your table root and leaf nodes using the DMV below
-- check index level
SELECT
index_level
,record_count
,page_count
,avg_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID('tempdb'),OBJECT_ID('scanseek'),NULL,NULL,'DETAILED')
GO
Now here we have clustered index on column "ID"
lets look for some direct matching records
select * from scanseek where id =340
and look at the Execution plan
you've requested rows directly in the query that's why you got a clustered index SEEK .
Clustered index scan: When Sql server reads through for the Row(s) from top to bottom in the clustered index.
for example searching data in non key column. In our table NAME is non key column so if we will search some data in the name column we will see clustered index scan because all the rows are in clustered index leaf level.
Example
select * from scanseek where name = 'Name340'
please note: I made this answer short for better understanding only, if you have any question or suggestion please comment below.
Expanding on Gordon's answer in the comments, a clustered index scan is scanning one of the tables indexes to find the values you are doing a where clause filter, or for a join to the next table in your query plan.
Tables can have multiple indexes (one clustered and many non-clustered) and SQL Server will search the appropriate one based upon the filter or join being executed.
Clustered Indexes are explained pretty well on MSDN. The key difference between clustered and non-clustered is that the clustered index defines how rows are stored on disk.
If your clustered index is very expensive to search due to the number of records, you may want to add a non-clustered index on the table for fields that you search for often, such as date fields used for filtering ranges of records.
A clustered index is one in which the terminal (leaf) node of the index is the actual data page itself. There can be only one clustered index per table, because it specifies how records are arranged within the data page. It is generally (and with some exceptions) considered the most performant index type (primarily because there is one less level of indirection before you get to your actual data record).
A "clustered index scan" means that the SQL engine is traversing your clustered index in search for a particular value (or set of values). It is one of the most efficient methods for locating a record (beat by a "clustered index seek" in which the SQL Engine is looking to match a single selected value).
The error message has absolutely nothing to do with the query plan. It just means that you are out of space on TempDB.
I have been having issues with performance and timeouts due to a clustered index scan. However another seemingly identical database did not have the same issue.
Turns out the COMPATIBILITY_LEVEL flag on the db was different... the version with COMPATIBILITY_LEVEL 100 was using the scan, the db with level 130 wasn't. Performance difference is huge (from more than 1 minute to less that 1 second for same query)
ALTER DATABASE [mydb] SET COMPATIBILITY_LEVEL = 130
If you hover over the step in the query plan, SSMS displays a description of what the step does. That will give you a baseline understanding of "Clustered Index Scan (Clustered)" and all other steps involved.

Why am I getting a Clustered Index Scan when the column is indexed?

So, we have a table, InventoryListItems, that has several columns. Because we're going to be looking for rows at times based on a particlar column (g_list_id, a foreign key), we have that foreign key column placed into a non-clustered index we'll call MYINDEX.
So when I search for data like this:
-- fake data for example
DECLARE #ListId uniqueidentifier
SELECT #ListId = '7BCD0E9F-28D9-4F40-BD67-803005179B04'
SELECT *
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
I expected that it would use the MYINDEX index to find just the needed rows, and then look up the information in those rows. So not as good as just finding everything we need in the index itself, but still a big win over doing a full scan of the table.
But instead it seems that I'm still getting a clustered index scan. I can't figure out why that would happen.
If I do something like SELECTing only the values in the included columns of the index, it does what I would expect, an index seek, and just pulls everything from the index.
But if I SELECT *, why does it just bail on the index and do a scan when it seems like it would still benefit greatly from using it because it's referenced in the WHERE clause?
Since you're doing a SELECT * and thus you retrieve all columns, SQL Server's query optimizer may have decided it's easier and more efficient to just do a clustered index scan - since it needs to go to the clustered index leaf level to get all the columns anyway (and doing a seek first, and then a key lookup to actually get the whole data page, is quite an expensive operation - scan might just be more efficient in this setup).
I'm almost sure if you try
SELECT g_list_id
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
then there will be an index seek instead (since you're only retrieving a single column - not everything).
That's one of the reasons why I would recommend to be extra careful when using SELECT * .... - try to avoid it if ever possible.

is count(indexed column) faster than count(*)? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Performance of COUNT SQL function
Hi all,
I've very large tables and I need to know number of records in each , My question is does it reduce the run time if I run :
select count(indexed column like my PK) from tbTest
instead of
select count(*) from tbTest
see Performance of COUNT SQL function
The important thing to note is they are not equivalent
Since the question is whether or not there is a performance difference, it would depend on the index. When you do COUNT(*), it will use the PK column(s) to determine the number of rows. If you do not have any indexes besides a clustered index on the PK column(s), it will scan the leaf nodes on the clustered index. That's probably a lot of pages. If you have a non clustered index that is skinnier than the clustered index, it will choose that instead, resulting in less reads.
So, if the column you select is contained in the smallest possible non-clustered index on the table, the SQL query optimizer will choose that for both count() (if you have a clustered ix that is the PK) and count(indexed_column). If you choose a count(indexed_col) that is only contained in a wide index, then the count() will be faster if your PK is a clustered index. The reason this works is that there is a pointer to the clustered index in all non-clustered indexes and SQL Server can figure out the number of rows based on that non-clustered index.
So, as usual in SQL Server, it depends. Do a showplan and compare the queries to each other.
SELECT COUNT(*) may be faster. That is because using * gives the optimizer liberty to choose any column to count on. Say you have a primary key on a INT column, and a non clustered key on a different bigint column. But the primary key is likely the clustered index, and as such it is in fact significantly larger than the nonclustered bigint index (has more pages). So if the optimizer is free to choose the bigint non-clustered index, it can return the response faster. Possible much faster, depending on the table.
So overall is always better to leave it as COUNT(*) and let the optimizer choose.
most likely, if the query scans the index instead of the whole table.
it is an easy thing to test, become your own scientist.
Both are identical. If you look at the query execution plan for both, both will do an "index scan"

Seek & Scan in SQL Server

After googling i came to know that Index seek is better than scan.
How can I write the query that will yield to seek instead of scan. I am trying to find this in google but as of now no luck.
Any simple example with explanation will be appreciated.
Thanks
Search by the primary key column(s)
Search by column(s) with index(es) on them
An index is a data structure that improves the speed of data retrieval operations on a database table. Most dbs automatically create an index when a primary key is defined for a table. SQL Server creates an index for primary key (composite or otherwise) as a "clustered index", but it doesn't have to be the primary key - it can be other columns.
NOTE:
LIKE '%'+ criteria +'%' will not use an index; LIKE criteria +'%' will
Related reading:
SQL SERVER – Index Seek vs. Index Scan
Index
Which is better: Bookmark/Key Lookup or Index Scan
Extending rexem's feedback:
The clustered index idea for pkeys isn't arbitrary. It's simply a default to make the pkey clustered. And clustered means that values will be physically placed near each other on a Sql Server 8k page thus assuming that if you fetch one value by pkey, you will probably be interested in its neighbors. i don't think it's a good idea to do that for pkeys since they're usually unique but arbitrary identifiers. Better to cluster on more useful data. One clustered index per table btw.
In a nutshell: If you can filter your query on a clustered index column (that makes sense) then all the better.
An index seek is when SQL Server can use a binary search to quickly find the row. The rows in an index are sorted in a particular order, and your query has to specify enough information in the WHERE clause to allow SQL Server to make use of the sorted index.
An index scan is when SQL Server cannot use the sort order of the index, but can still use the index itself. This makes sense if the table rows are very large, but the index is relatively small. SQL Server will only have to read the smaller index from disk.
As a simple example, take a phonebook table:
id int identity primary key
lastname varchar(50)
phonenumber varchar(15)
Say that there is an index on (lastname). Then this query will result in an index seek:
select * from phonebook where lastname = 'leno'
This query will result in an index scan:
select * from phonebook where lastname like '%no'
The analogy with a real life phonebook is that you can't look up people whose name ends in 'no'. You have to browse the entire phonebook.

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/