is count(indexed column) faster than count(*)? [duplicate] - sql

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Performance of COUNT SQL function
Hi all,
I've very large tables and I need to know number of records in each , My question is does it reduce the run time if I run :
select count(indexed column like my PK) from tbTest
instead of
select count(*) from tbTest

see Performance of COUNT SQL function
The important thing to note is they are not equivalent

Since the question is whether or not there is a performance difference, it would depend on the index. When you do COUNT(*), it will use the PK column(s) to determine the number of rows. If you do not have any indexes besides a clustered index on the PK column(s), it will scan the leaf nodes on the clustered index. That's probably a lot of pages. If you have a non clustered index that is skinnier than the clustered index, it will choose that instead, resulting in less reads.
So, if the column you select is contained in the smallest possible non-clustered index on the table, the SQL query optimizer will choose that for both count() (if you have a clustered ix that is the PK) and count(indexed_column). If you choose a count(indexed_col) that is only contained in a wide index, then the count() will be faster if your PK is a clustered index. The reason this works is that there is a pointer to the clustered index in all non-clustered indexes and SQL Server can figure out the number of rows based on that non-clustered index.
So, as usual in SQL Server, it depends. Do a showplan and compare the queries to each other.

SELECT COUNT(*) may be faster. That is because using * gives the optimizer liberty to choose any column to count on. Say you have a primary key on a INT column, and a non clustered key on a different bigint column. But the primary key is likely the clustered index, and as such it is in fact significantly larger than the nonclustered bigint index (has more pages). So if the optimizer is free to choose the bigint non-clustered index, it can return the response faster. Possible much faster, depending on the table.
So overall is always better to leave it as COUNT(*) and let the optimizer choose.

most likely, if the query scans the index instead of the whole table.
it is an easy thing to test, become your own scientist.

Both are identical. If you look at the query execution plan for both, both will do an "index scan"

Related

Optimize the Clustered Index Scan into Clustered Index Seek

There is scenario, I have table with 40 columns and I have to select all data of a table (including all columns). I have created a clustered index on the table and its including Clustered Index Scan while fetching full data set from the table.
I know that without any filter or join key, SQL Server will choose Clustered Index Scan instead of Clustered Index Seek. But, I want to have optimize execution plan by optimizing Clustered Index Scan into Clustered Index Seek. Is there any solution to achieve this? Please share.
Below is the screenshot of the execution plan:
Something is not quite right in the question / request, because what you are asking for will perform badly. I suspect it comes from mis-understanding what a clustered index is.
The clustered index - which is perhaps better stated as a clustered table - is the table of data, its not separate to the table, it is the table. If the order of the data on the table is already based on ITEM ID then the scan is the most efficient access method for your query (especially given the select *) - you do not want to seek in this scenario at all - and I don't believe that it is your scenario due to the sort operator.
If the clustered table is ordered based on another field, then you would need an additional non-clustered index to provide the correct order. You would then try to force a plan which was a non-clustered index scan, nested loop to a clustered index seek. That can be achieved using query hints, most likely an INNER LOOP JOIN would cause the seek - but a FORCESEEK also exists which can be used.
Performance wise this second option is never going to win - you are in effect looking at a tipping point notion (https://www.sqlskills.com/blogs/kimberly/the-tipping-point-query-answers/)
Well, I was trying to achieve the same, I wanted an index seek instead of an index scan on my top query.
SELECT TOP 5 id FROM mytable
Here is the execution plan being shown for the query:
I even tried the Offset Fetch Next approach, the plan was same.
To avoid a index scan, I included a fake primary key filter like below:
SELECT TOP 5 id FROM mytable where id != 0
I know, I won't have a 0 value in my primary key, so I added it in top query, which was resolved to an index seek instead of index scan:
Even though, the query plan comparison gives operation cost as similar to other, for index seek and scan in this regard. But I think to achieve index seek this way, it is an extra operation for the db to perform because it has to compare whether the id is 0 or not. Which we entirely do not need it to do if we want the top few records.

SQL Server non-clustered index

I have two different queries in SQL Server and I want to clarify
how the execution plan would be different, and
which of them is more efficient
Queries:
SELECT *
FROM table_name
WHERE column < 2
and
SELECT column
FROM table_name
WHERE column < 2
I have a non-clustered index on column.
I used to use Postgresql and I am not familiar with SQL Server and these kind of indexes.
As I read many questions here I kept two notes:
When I have a non-clustered index, I need one more step in order to have access to data
With a non-clustered index I could have a copy of part of the table and I get a quicker response time.
So, I got confused.
One more question is that when I have "SELECT *" which is the influence of a non-clustered index?
1st query :
Depending on the size of the data you might face lookup issues such as Key lookup and RID lookups .
2nd query :
It will be faster because it will not fetch columns that are not part of the index , though i recommend using covering index ..
I recommend you check this blog post
The first select will use the non-clustered index to find the clustering key [clustered index exists] or page and slot [no clustered index]. Then that will be used to get the row. The query plan will be different depending on your STATS (the data).
The second query is "covered" by the non-clustered index. What that means is that the non-clustered index contains all of the data that you are selecting. The clustering key is not needed, and the clustered index and/or heap is not needed to provide data to the select list.

Why NonClustered index scan faster than Clustered Index scan?

As I know, heap tables are tables without clustered index and has no physical order.
I have a heap table "scan" with 120k rows and I am using this select:
SELECT id FROM scan
If I create a non-clustered index for the column "id", I get 223 physical reads.
If I remove the non-clustered index and alter the table to make "id" my primary key (and so my clustered index), I get 515 physical reads.
If the clustered index table is something like this picture:
Why Clustered Index Scans workw like the table scan? (or worse in case of retrieving all rows). Why it is not using the "clustered index table" that has less blocks and already has the ID that I need?
SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page. A clustered index is different: its leaf nodes are the data page itself and the clustered index's b-tree becomes the backing store for the table itself; the heap ceases to exist for the table.
Your non-clustered index contains a single, presumably integer column. It's a small, compact index to start with. Your query select id from scan has a covering index: the query can be satisfied just by examining the index, which is what is happening. If, however, your query included columns not in the index, assuming the optimizer elected to use the non-clustered index, an additional lookup would be required to fetch the data pages required, either from the clustering index or from the heap.
To understand what's going on, you need to examine the execution plan selected by the optimizer:
See Displaying Graphical Execution Plans
See Red Gate's SQL Server Execution Plans, by Grant Fritchey
A clustered index generally is about as big as the same data in a heap would be (assuming the same page fullness). It should use just a little more reads than a heap would use because of additional B-tree levels.
A CI cannot be smaller than a heap would be. I don't see why you would think that. Most of the size of a partition (be it a heap or a tree) is in the data.
Note, that less physical reads does not necessarily translate to a query being faster. Random IO can be 100x slower than sequential IO.
When to use Clustered Index-
Query Considerations:
1) Return a range of values by using operators such as BETWEEN, >, >=, <, and <= 2) Return large result sets
3) Use JOIN clauses; typically these are foreign key columns
4) Use ORDER BY, or GROUP BY clauses. An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.
Column Considerations :
Consider columns that have one or more of the following attributes:
1) Are unique or contain many distinct values
2) Defined as IDENTITY because the column is guaranteed to be unique within the table
3) Used frequently to sort the data retrieved from a table
Clustered indexes are not a good choice for the following attributes:
1) Columns that undergo frequent changes
2) Wide keys
When to use Nonclustered Index-
Query Considerations:
1) Use JOIN or GROUP BY clauses. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
2) Queries that do not return large result sets
3) Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches
Column Considerations :
Consider columns that have one or more of the following attributes:
1) Cover the query. For more information, see Index with Included Columns
2) Lots of distinct values, such as a combination of last name and first name, if a clustered index is used for other columns
3) Used frequently to sort the data retrieved from a table
Database Considerations:
1) Databases or tables with low update requirements, but large volumes of data can benefit from many nonclustered indexes to improve query performance.
2) Online Transaction Processing applications and databases that contain heavily updated tables should avoid over-indexing. Additionally, indexes should be narrow, that is, with as few columns as possible.
Try running
DBCC DROPCLEANBUFFERS
Before the queries...
If you really want to compare them.
Physical reads don't mean the same as logical reads when optimizing a query

Optimizing my SQL queries - picking the right indexes

I have a basic table as follows.
create table Orders
(
ID INT IDENTITY(1,1) PRIMARY KEY,
Company VARCHAR(3),
ItemID INT,
BoxID INT,
OrderNum VARCHAR(5),
Status VARCHAR(5),
--about 10 more columns, varchars and ints and dates
)
I'm trying to optimize all my SQL since I am getting a fair few deadlocks and some slowness - but I'm no expert on this sort of thing!
I created a few indexes:
Clustered on the ID (Primary Key).
Non-Clustered index on ([ItemID])
Non-Clustered index on ([BoxID])
Non-Clustered index on ([Company],[OrderNum],[Status])
Maybe 1 or 2 more on some other columns
But I'm not 100% happy with the results.
SELECT * FROM Orders WHERE ItemID=100
Gives me an index seek + a key lookup and a Nested loop (Inner join).
I can see why - but don't know if I should do anything about it. They key lookup is 97% of the batch which seems bad!
Every query used will pull back every column in the table, but I don't like the idea of including every column in the index.
I'm making a change now to query everything on the [Company] field. Every query will be using it, because results should never contain more than 1 value. So they will all change:
SELECT * FROM Orders WHERE ItemID=100 --Old
SELECT * FROM Orders WHERE Company='a' and ItemID=100 --New
But the execution plan of that gives me exactly the same as not including company (which does surprise me!).
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Is it worth adding [Company] to all my indexes since it seems to make
0 different to the execution plan?
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that
mean every query will have 2 seeks?
Is it worth 'including' all other columns in my indexes to avoid the
key lookup? (making the index a tonne bigger, but potentially
speeding it up?) i.e.
CREATE NONCLUSTERED INDEX [IX_Orders_MyIndex] ON [Orders]
( [Company] ASC, [OrderNum] ASC, [Status] ASC )
INCLUDE ([ID],[ItemID],[BoxID],
[Column5],[Column6],[Column7],[Column8],[Column9],[Column10],etc)
That seems messy if I did it on 4 or 5 indexes.
Basically I have 4-5 queries which run quite often (some selects and updates) so I want to make it as efficient as possible.
All queries will use the [company] field, and at least 1 other. How should I go about it.
Any help appreciated :)
In your execution plan, you say that lookup takes 97% of the batch.
In this case it doesn't mean anything because an index seek is very fast and you didn't have that much operation to be done.
That lookup is actually the record you read based on the index you have specified.
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Non-Clustered index on ([Company],[OrderNum],[Status])
This index will be considered only if Company, OrderNum and Status appear in your where clause.
Concatenated indexes generates a key that would look like this 0000000000000 when you pass only company it creates an incomplete key that requires using wildcard for the other to values.
It would look a little like this : key like 'XXX%' this logic will require an index scan which is time consuming.
The optimizer will determine that it's preferable to first seek and rows from the ItemID index and then scan these to match any with the required company.
Is it worth adding [Company] to all my indexes since it seems to make 0 different to the execution plan?
You should consider having a Company index instead of adding it to all your indexes.
Composite index could speed things up by reducing the number of nested loops, but you have to think then thoroughly.
The order of the fields you add to such an index is very important, they should be ordered by uniqueness to allow a better seek. Also, you should never add a field that might not be used in a query.
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that mean every query will have 2 seeks?
Having more than one index seek is not all that bad, they are usually paralleled and only the result of both are matched together.
Is it worth 'including' all other columns in my indexes to avoid the key lookup? (making the index a tonne bigger, but potentially speeding it up?)
It is worth when it's only a few fields that could be optional in the where clause or when you have queries that select only those fields when you are using the specified index.
Last notes
All indexes are not equal, comparing string (varchar) is not the same as comparing numbers (integer, datetime, bytes, etc).
Also, keeping them clean helps a lot, if your indexes are fragmented, they will be next to useless in terms of performance gain.

sql server query performance where clause

I got a table like with 10 000 lines.
declare #a table
(
id bigint not null,
nm varchar(100) not null,
filter bigint
primary key (id)
)
A select, with 4-5 join, is taking x seconds. If a where clause is added, it's now taking 3x seconds. The where clause:
filter = #filder or
filter is null
I applied a nonclustered index on the column, but I'm getting only 10% on perfomance.
Any tips?
edit: the perfomance issue happens when the filter column is added. all joins are on primary keys.
I have a few thoughts on this:
Chances are that your joins are joining on table.id - which is a primary key and has an index - bingo - high selectivity (because the values are unique). With it being indexed the optimizer can really optimize access to this table when it is used in joins.
I'm not 100% sure but - either you do not have an index on filter or it is not selective enough. If you do not have an index - the optimizer will use a table scan. If you do have an index, but it is not selective enough, it will use a table scan anyways. Scans are expensive.
Even if you do have an index on filter The optimizer does not like OR predicates. Basically when using an OR the optimizer might end up using an index scan instead of an index seek. Try using this instead: #filder = ISNULL(Filter, #filder as #sut13 suggested.
So to improve the performance: add an index on filter if you do not have one and adjust your where clause to not use OR as I have suggested.
Also:
You shouldn't expect the query with the where filter to perform equal to or better than the query with 4-5 joins. If the query with the joins is more selective and makes better use of indexes it is going to perform better
It's likely that the lack of index (based on what you described on the structure) on the filter column is leading to a table scan. The only way to be sure is to take a look at the execution plan for the query. That will tell you what the optimizer is doing with the query and usually give you enough information so you can understand why it's doing that and what you need to do to fix it.
Probably, you need an index on the filter column. But, with the 'OR filter IS NULL' might lead to a scan anyway, depending on how many null values are in the data.
If you use the ISNULL as outlined, unfortunately, that's a function on the column and will probably (depending on the indexes used and other columns in the WHERE clause that can be used to initially filter the data, etc.) result in a scan and not a seek.