Does clustered index sort order have impact on performance - sql

If a PK of a table is a standard auto-increment int (Id) and the retrieved and updated records are almost always the ones closer to the max Id will it make any difference performance-wise whether the PK clustered index is sorted as ascending or descending?
When such PK is created, SSMS by default sets the sort order of the index as ascending and since the rows most accessed are always the ones closer to the current max Id, I'm wondering if changing the sorting to descending would speed up the retrieval since the records will be sorted top-down instead of bottom-up and the records close to the top are accessed most frequently.

I don't think there will be any performance hit. Since, it's going to perform a binary search for the index key to access and then the specific data block with that key. Either way, that binary search will hit O(log N) complexity. So in total O(log N) + 1 and since it's clustered index, it actually should be O(log N) time complexity; since the table records are physically ordered instead of having a separate index page/block.

Indexes use a B-tree structure, so No. But if you have an index that is based off multiple columns, you want the most distinct columns on the outer level, and least distinct on the inner levels. For example, if you had 2 columns (gender and age), you would want age on the outer and gender on the inner, because there are only 2 possible genders, whereas there are many more ages. This will impact performance.

Related

what is the minimum time Order of a query in SQL databases?

I want to know what is the minimum query time in a given SQL(specially SQLite) database(with n records).
I know that full table scan is O(n) and for indexed column (and RowId) it is O(log(n)).
1st question : is there any situation that the time is smaller than O(log(n))?
2nd question : why querying on RowId (SELECT *FROM table_01 WHERE rowid='234')is also O(log(n))?? if it (RowId)is ordered from 1 to n I logically expect that SQL can immediately find the row with a given RowId
Finding a specific row requires a search. (Not every rowid is necessarily present, so the database needs to look.) The optimistic case, or even the average case, should be much faster than log(n), but the worst case cannot be, since it requires searching a list.
If you want to retrieve the smallest or largest value from an indexed column (SELECT MIN(x) FROM table), the database can simply read the first or last value, and the time is in O(1).
Indexes are stored as a B-tree, with the indexed columns as the key.
Tables are stored as a B-tree, with the rowid as the key, so searching for the rowid is just as fast as searching for a value in an index.

SQL Server Update Where clause performance with clustered index

I have an update query like below to update AccessDate only when the current date is less then the passed one. The table has a clustered index on Id.
Is there any use to have another non clustered index on Id, AccessDate?
Update Person
Set AccessDate = #NewAccessDate
Where Id = #Id
And AccessDate < #NewAccessDate
Under most circumstances, I would say that the update would be faster without the index. The key consideration is that the index itself would also need to be updated by the statement.
The one mitigating factor is when each id has lots and lots of AccessDates, and very, very few that are less than #NewAccessdate. For instance, if there were 10,000 rows per id and only 1 matched the condition, then updating the index is probably faster than scanning all the access dates.
Or, similarly, if most ids had no matching records for the WHERE clause.
I'm not sure what the cutoff value is for when one is better or not -- it would depend on other factors, such as your hardware and the number of records per page. But given that there is a trade-off, you are probably safe not putting in the index.

SQL Server range indexing ideas

I need help understanding how to create proper indexing on a table for fast range selects.
I have a table with the following columns:
Column --- Type
frameidx --- int
u --- int
v --- int
x --- float(53)
y --- float(53)
z --- float(53)
None of these columns is unique.
There are to be approximately 30 million records in this table.
An average query would look something like this:
Select x, y, z from tablename
Where
frameidx = 4 AND
u between 34 AND 500
v between 0 AND 200
Pretty straight forward, no joins, no nested stuff. Just good ol' subset selection.
What sort of indexing should I do in MS SQL Server (2012) for this table in order to be able to fetch records (which can be in the thousands from this query) in (ideally) less than a 100ms, for example?
Thanks.
If you don't have indices, SQL Server needs to scan the whole table to find the required data. For such a big table (30M rows), that's time consuming.
If you have indices appropriate for your query, the SQL server will seek them (i.e. it will quickly find the required rows in the index, using the index structure). The index consists of the indexed column values, in the given index order, and pointers to the rows in the indexed table, so once the data is found in the index, the necessary data from the indexed table is recovered using those pointers.
SO, if you want to speed up thing, you need to create indexes for the columns which you're going to use to filter the ranges.
Adding indexes will improve the query response time, but will also take up more space, and make the insertions slower. So you shouldn't create a lot of indexes.
If you're going to use all the columns for filtering all the time, you should make only one index. And, ideally, that index should be the more selective, i.e. the one that has the most different values (the least number of repeated values). Only one index can be used for each query.
If you're going to use different sets of range filters, you should create more indexes.
Using a composite can be good or bad. In a composite key, the rows are ordered by all of the columns in the index. So, provided you index by A, B, C & D, filtering or ordering by A will give consecutive rows of the index, and it's a quick operation. And filtering by A, B, C & D, is ideal for this index. However, filtering or ordering only by D, is the worst case for this index, because it will need to recover data spread all over the index: remember that the data is ordered by A, then B, then C, then D, so the D info is spread all over the index. Depending on several factors (table stats, index selectivity, and so on), it's even possible that no index is used at all, and the table is scanned.
A final note on the clustered index: a clustered index defines the physical order in which the data is stored in the table. It doesn't need to be unique. If you're using one of the columns for filtering most of the times, it's a good idea to make that the table's clustered index, because, in this case, instead of seeking an index and finding the data in the indexed table using pointers, the table is sought directly, and that can improve performance.
So there is no simple answer, but I hope to know you have info to improve your query speed.
EDIT
Corrected info, according to a very interesting comment.

Why is there a non-clustered index scan when counting all rows in a table?

As far as I understand it, each transaction sees its own version of the database, so the system cannot get the total number of rows from some counter and thus needs to scan an index. But I thought it would be the clustered index on the primary key, not the additional indexes. If I had more than one additional index, which one will be chosen, anyway?
When digging into the matter, I've noticed another strange thing. Suppose there are two identical tables, Articles and Articles2, each with three columns: Id, View_Count, and Title. The first has only a clustered PK-based index, while the second one has an additional non-clustered, non-unique index on view_count. The query SELECT COUNT(1) FROM Articles runs 2 times faster for the table with the additional index.
SQL Server will optimize your query - if it needs to count the rows in a table, it will choose the smallest possible set of data to do so.
So if you consider your clustered index - it contains the actual data pages - possibly several thousand bytes per row. To load all those bytes just to count the rows would be wasteful - even just in terms of disk I/O.
Therefore, it there is a non-clustered index that's not filtered or restricted in any way, SQL Server will pick that data structure to count - since the non-clustered index basically contains the columns you've put into the NC index (plus the clustered index key) - much less data to load just to count the number of rows.

is count(indexed column) faster than count(*)? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Performance of COUNT SQL function
Hi all,
I've very large tables and I need to know number of records in each , My question is does it reduce the run time if I run :
select count(indexed column like my PK) from tbTest
instead of
select count(*) from tbTest
see Performance of COUNT SQL function
The important thing to note is they are not equivalent
Since the question is whether or not there is a performance difference, it would depend on the index. When you do COUNT(*), it will use the PK column(s) to determine the number of rows. If you do not have any indexes besides a clustered index on the PK column(s), it will scan the leaf nodes on the clustered index. That's probably a lot of pages. If you have a non clustered index that is skinnier than the clustered index, it will choose that instead, resulting in less reads.
So, if the column you select is contained in the smallest possible non-clustered index on the table, the SQL query optimizer will choose that for both count() (if you have a clustered ix that is the PK) and count(indexed_column). If you choose a count(indexed_col) that is only contained in a wide index, then the count() will be faster if your PK is a clustered index. The reason this works is that there is a pointer to the clustered index in all non-clustered indexes and SQL Server can figure out the number of rows based on that non-clustered index.
So, as usual in SQL Server, it depends. Do a showplan and compare the queries to each other.
SELECT COUNT(*) may be faster. That is because using * gives the optimizer liberty to choose any column to count on. Say you have a primary key on a INT column, and a non clustered key on a different bigint column. But the primary key is likely the clustered index, and as such it is in fact significantly larger than the nonclustered bigint index (has more pages). So if the optimizer is free to choose the bigint non-clustered index, it can return the response faster. Possible much faster, depending on the table.
So overall is always better to leave it as COUNT(*) and let the optimizer choose.
most likely, if the query scans the index instead of the whole table.
it is an easy thing to test, become your own scientist.
Both are identical. If you look at the query execution plan for both, both will do an "index scan"