Why is there a non-clustered index scan when counting all rows in a table?

Why is there a non-clustered index scan when counting all rows in a table? - sql

As far as I understand it, each transaction sees its own version of the database, so the system cannot get the total number of rows from some counter and thus needs to scan an index. But I thought it would be the clustered index on the primary key, not the additional indexes. If I had more than one additional index, which one will be chosen, anyway?
When digging into the matter, I've noticed another strange thing. Suppose there are two identical tables, Articles and Articles2, each with three columns: Id, View_Count, and Title. The first has only a clustered PK-based index, while the second one has an additional non-clustered, non-unique index on view_count. The query SELECT COUNT(1) FROM Articles runs 2 times faster for the table with the additional index.

SQL Server will optimize your query - if it needs to count the rows in a table, it will choose the smallest possible set of data to do so.
So if you consider your clustered index - it contains the actual data pages - possibly several thousand bytes per row. To load all those bytes just to count the rows would be wasteful - even just in terms of disk I/O.
Therefore, it there is a non-clustered index that's not filtered or restricted in any way, SQL Server will pick that data structure to count - since the non-clustered index basically contains the columns you've put into the NC index (plus the clustered index key) - much less data to load just to count the number of rows.

Related

SQL Server non-clustered index

I have two different queries in SQL Server and I want to clarify
how the execution plan would be different, and
which of them is more efficient
Queries:
SELECT *
FROM table_name
WHERE column < 2
and
SELECT column
FROM table_name
WHERE column < 2
I have a non-clustered index on column.
I used to use Postgresql and I am not familiar with SQL Server and these kind of indexes.
As I read many questions here I kept two notes:
When I have a non-clustered index, I need one more step in order to have access to data
With a non-clustered index I could have a copy of part of the table and I get a quicker response time.
So, I got confused.
One more question is that when I have "SELECT *" which is the influence of a non-clustered index?

1st query :
Depending on the size of the data you might face lookup issues such as Key lookup and RID lookups .
2nd query :
It will be faster because it will not fetch columns that are not part of the index , though i recommend using covering index ..
I recommend you check this blog post

The first select will use the non-clustered index to find the clustering key [clustered index exists] or page and slot [no clustered index]. Then that will be used to get the row. The query plan will be different depending on your STATS (the data).
The second query is "covered" by the non-clustered index. What that means is that the non-clustered index contains all of the data that you are selecting. The clustering key is not needed, and the clustered index and/or heap is not needed to provide data to the select list.

Index Decreases Number of Rows Read; No performance Gain

I created a non-clustered, non-unique index on a column (date) on a large table (16 million rows), but am getting very similar query speeds when compared to the exact same query that's being forced to not use any indexes.
Query 1 (uses index):
SELECT *
FROM testtable
WHERE date BETWEEN '01/01/2017' AND '03/01/2017'
ORDER BY date
Query 2 (no index):
SELECT *
FROM testtable WITH(INDEX(0))
WHERE date BETWEEN '01/01/2017' AND '03/01/2017'
ORDER BY date
Both queries take the same amount of time to run, and return the same result. When looking at the Execution plan for each, Query 1's number of rows read is
~ 4 million rows, where as Query 2 is reading 106 million rows. It appears that the index is working, but I'm not gaining any performance benefits from it.
Any ideas as to why this is, or how to increase my query speed in this case would be much appreciated.

Create Indexes with Included Columns: Cover index
This topic describes how to add included (or nonkey) columns to extend the functionality of nonclustered indexes in SQL Server by using SQL Server Management Studio or Transact-SQL. By including nonkey columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:
They can be data types not allowed as index key columns.
They are not considered by the Database Engine when calculating the
number of index key columns or index key size.
An index with nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.
CREATE NONCLUSTERED INDEX IX_your_index_name
ON testtable (date)
INCLUDE (col1,col2,col3);
GO

You need to build an index around the need of your query - this quick and free video course should bring you up to speed really quick.
https://www.brentozar.com/archive/2016/10/think-like-engine-class-now-free-open-source/

Why NonClustered index scan faster than Clustered Index scan?

As I know, heap tables are tables without clustered index and has no physical order.
I have a heap table "scan" with 120k rows and I am using this select:
SELECT id FROM scan
If I create a non-clustered index for the column "id", I get 223 physical reads.
If I remove the non-clustered index and alter the table to make "id" my primary key (and so my clustered index), I get 515 physical reads.
If the clustered index table is something like this picture:
Why Clustered Index Scans workw like the table scan? (or worse in case of retrieving all rows). Why it is not using the "clustered index table" that has less blocks and already has the ID that I need?

SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page. A clustered index is different: its leaf nodes are the data page itself and the clustered index's b-tree becomes the backing store for the table itself; the heap ceases to exist for the table.
Your non-clustered index contains a single, presumably integer column. It's a small, compact index to start with. Your query select id from scan has a covering index: the query can be satisfied just by examining the index, which is what is happening. If, however, your query included columns not in the index, assuming the optimizer elected to use the non-clustered index, an additional lookup would be required to fetch the data pages required, either from the clustering index or from the heap.
To understand what's going on, you need to examine the execution plan selected by the optimizer:
See Displaying Graphical Execution Plans
See Red Gate's SQL Server Execution Plans, by Grant Fritchey

A clustered index generally is about as big as the same data in a heap would be (assuming the same page fullness). It should use just a little more reads than a heap would use because of additional B-tree levels.
A CI cannot be smaller than a heap would be. I don't see why you would think that. Most of the size of a partition (be it a heap or a tree) is in the data.
Note, that less physical reads does not necessarily translate to a query being faster. Random IO can be 100x slower than sequential IO.

When to use Clustered Index-
Query Considerations:
1) Return a range of values by using operators such as BETWEEN, >, >=, <, and <= 2) Return large result sets
3) Use JOIN clauses; typically these are foreign key columns
4) Use ORDER BY, or GROUP BY clauses. An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.
Column Considerations :
Consider columns that have one or more of the following attributes:
1) Are unique or contain many distinct values
2) Defined as IDENTITY because the column is guaranteed to be unique within the table
3) Used frequently to sort the data retrieved from a table
Clustered indexes are not a good choice for the following attributes:
1) Columns that undergo frequent changes
2) Wide keys
When to use Nonclustered Index-
Query Considerations:
1) Use JOIN or GROUP BY clauses. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
2) Queries that do not return large result sets
3) Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches
Column Considerations :
Consider columns that have one or more of the following attributes:
1) Cover the query. For more information, see Index with Included Columns
2) Lots of distinct values, such as a combination of last name and first name, if a clustered index is used for other columns
3) Used frequently to sort the data retrieved from a table
Database Considerations:
1) Databases or tables with low update requirements, but large volumes of data can benefit from many nonclustered indexes to improve query performance.
2) Online Transaction Processing applications and databases that contain heavily updated tables should avoid over-indexing. Additionally, indexes should be narrow, that is, with as few columns as possible.

Try running
DBCC DROPCLEANBUFFERS
Before the queries...
If you really want to compare them.
Physical reads don't mean the same as logical reads when optimizing a query

Does clustered index sort order have impact on performance

If a PK of a table is a standard auto-increment int (Id) and the retrieved and updated records are almost always the ones closer to the max Id will it make any difference performance-wise whether the PK clustered index is sorted as ascending or descending?
When such PK is created, SSMS by default sets the sort order of the index as ascending and since the rows most accessed are always the ones closer to the current max Id, I'm wondering if changing the sorting to descending would speed up the retrieval since the records will be sorted top-down instead of bottom-up and the records close to the top are accessed most frequently.

I don't think there will be any performance hit. Since, it's going to perform a binary search for the index key to access and then the specific data block with that key. Either way, that binary search will hit O(log N) complexity. So in total O(log N) + 1 and since it's clustered index, it actually should be O(log N) time complexity; since the table records are physically ordered instead of having a separate index page/block.

Indexes use a B-tree structure, so No. But if you have an index that is based off multiple columns, you want the most distinct columns on the outer level, and least distinct on the inner levels. For example, if you had 2 columns (gender and age), you would want age on the outer and gender on the inner, because there are only 2 possible genders, whereas there are many more ages. This will impact performance.

How to interpret my ShowPlan(Execution Plan)

I have an Access 2003 DB and I want to improve its performance. Yesterday, I read an article about Execution Plan (Show Plan) and today I ran my show Plan for this query:
SELECT tb_bauteile_Basis.*
FROM tb_bauteile_Basis
ORDER BY tb_bauteile_Basis.Name;
I put an index on the Name field, and show its query plan:
Inputs to Query -
Table 'tb_bauteile_Basis'
Using index 'Name'
Having Indexes:
Name 1553 entries, 17 pages, 1543 values
which has 1 column, fixed
ID 1553 entries, 4 pages, 1553 values
which has 1 column, fixed, clustered and/or counter
- End inputs to Query -
01) Scan table 'tb_bauteile_Basis'
Using index 'Name'
Next, I deleted the index from Name, and the new query plan is:
- Inputs to Query -
Table 'tb_bauteile_Basis'
Using index 'PrimaryKey'
Having Indexes:
PrimaryKey 1553 entries, 4 pages, 1553 values
which has 1 column, fixed, unique, clustered and/or counter, primary-key, no-nulls
Plauskomponente 1553 entries, 4 pages, 3 values
which has 1 column, fixed
Name 1553 entries, 17 pages, 1543 values
which has 1 column, fixed
ID 1553 entries, 4 pages, 1553 values
which has 1 column, fixed, clustered and/or counter
- End inputs to Query -
01) Scan table 'tb_bauteile_Basis'
Using index 'PrimaryKey'
How should I interpret these two query plans?
In the second Showplan that means, should I create an index for Plauskomponente,Name,ID, and should I make a composite index for these three fields? How can I determine whether I should make a composite index?
And why doesn't Plauskommponente appear in the first showplan?

in the second Showplan that means: I should put Index for Plauskomponente, Name, ID? and should I make a Composite Index From these three fields? How can I find that I should make a Composite index?
Those names appear in the ShowPlan section which shows the information Jet analyses when devising the query plan. Indexes on those 3 fields, either separate indexes or a single composite index based on all three, would not help that particular query. And actually adding indexes will slow down other operations ... when you add or delete rows, or edit values in indexed fields, the db engine must write changes to the table and to the composite index.
Optimizing indexing can be tricky. Indexes can speed up SELECT, but slow down INSERT, DELETE and UPDATE operations. You need to find the right balance for your application. If you're relatively new at this in Access, try the Performance Analyzer wizard from the database tools section of the menu. Examine the suggestions it offers. Many times those suggestions will involve indexes. You can add indexes it suggests, then drop them later if they degrade or don't improve your overall performance.
Why the Plauskommponente doesn't appear in the first showplan?
Beats me. I would guess the query planner already found the Names index, so considered it pointless to look for other indexes.
However, that brings up another important point. The table includes 1,553 rows, but Plauskomponente contains only 3 different values. With such low variability, an index on Plauskomponente would probably not even be used in a plan for a query which had a WHERE clause based on Plauskomponente. #Namphibian touched on the reason in a comment. Reading the index to find out which rows match the criterion, then reading the matching rows could be considered more costly than just ignoring the index and reading all the rows from the table.
Finally notice the statistics mentioned in the ShowPlan information sections. Those statistics are updated when you compact the database. So compacting is useful to give the query planner the latest information to use as it makes its decisions about how to optimize the query plan.

The query is reading the entire table. You don't have a where clause. It's unlikely you will be able to optimize it. What you could do is only select the columns you need. This reduce the amount of data being passed back. Adding a index will not speed up the query as you don't have a where clause. If you added a where clause and indexed those fields used in the where clause you might be able to optimise the query.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas