Why SQL Server index is not used? - sql

Most of my SQL queries have WHERE rec_id <> 'D'; as for example:
select * from Table1 where Field1 = 'ABC' and rec_id <> 'D'
I added index on REC_ID. But when I run this query and look at the execution plan, the new index (REC_ID) is not used. The Execution plan shows Cost of 50% of nonClustered index Field1 and 50% RID Lookup (Heap) in Table1.
Why the index REC_ID not used?

For this query:
select *
from Table1
where Field1 = 'ABC' and rec_id <> 'D';
The best index is table1(Field1, rec_id).
However, your query may not be able to take advantage of an index. The goal of using an index for a where clause is to reduce the number of pages that need to be read. To understand the concept for non-clustered indexes on normal rows, you need some basic ideas:
Records are stored on pages.
Each page is 8,192 bytes (slightly fewer used for data) and can store some number of records.
The entire page is loaded into memory to read a record.
Say a record is about 80 bytes and there are 100 records on each page. If 10% of the records have Field1 = 'ABC', then there will be about ten on each page. That means that using the index would not (typically) save any page reads. If 1% of the records match, then there is about one on each page. The index still isn't helpful.
If only 0.01% of the records match (30 in your case), then only a fraction of the pages need to be read. This is the sweet spot for indexes, and where they are really helpful.
The number of matching records is called "selectivity". If the where clause is not very selective, then a non-clustered index will not be useful.
Sometimes, a clustered index can be helpful in this situation. However, clustered indexes may have more overhead for insert and certain update transactions. So, the choice of index needs to be based on the queries being processed and other ways that the table is used.

SQL Server uses many factors to decide which indices to use. It must have determined that using the index on Field1 would be more effective that using the index on rec_id - meaning that field1={value} defines a smaller set than rec_id <> {value} based on data dispersion, etc., so there are fewer records to compare against the other condition. Note that the actual value is usually irrelevant in determining which index to use.

Related

Oracle indexes. "DISTINCT_KEYS" vs "NUM_ROWS". Do I need an NONUNIQUE index?

I have a table in which I have a lot of indexes. I noticed that in on one of them "DISTINCT_KEYS" is almost the same as "NUM_ROWS". Is such an index needed?
Or maybe it is better to remove it because:
takes a place on the database.
When adding data to a table, it does not necessarily slow down the refreshing of indexes.
What do you think? Will deleting this index slow down the queries using the name of this column?
Is such an index needed?
All you can tell from statistics like DISTINCT_KEYS and NUM_ROWS (and other statistics like histograms) is whether an index might be useful. An index is only truly "needed" if it is actually being used by queries in your system. (See ALTER INDEX ... MONITORING USAGE command)
An index having DISTINCT_KEYS that is almost equal to NUM_ROWS certainly might be useful. In fact, it would be much more natural to suspect an index to be useless if DISTINCT_KEYS was a very low percentage of NUM_ROWS.
Suppose you have a query:
SELECT column_x
FROM table_y
WHERE column_z = :some_value
Suppose the index on column_z shows DISTINCT_KEYS = 999999 and NUM_ROWS = 1000000.
That means, on average, each distinct key has (very) slightly more than one row. That makes the index very selective and very useful. When our query runs, we will use the index to pull out only one row of the table very quickly.
Suppose, instead, the index on column_z shows DISTINCT_KEYS = 2 and NUM_ROWS = 1000000. Now, each distinct key has an average of 500,000 rows. This index is worthless because we have to read each half of the blocks from the index and then still probably wind up reading at last half of the blocks from the table (probably way more than half). Worse, these reads are all single block reads. It would be way, way faster for Oracle to ignore the index and do a full table scan -- fewer blocks in total to read and all the reads are multi-block reads (e.g., 8 at a time).
For completeness, I'll point out that an index with DISTINCT_KEYS = 2 and NUM_ROWS = 1000000 could still be useful if the data is very skewed. That is, for example, if one distinct key had 999,000 rows and the other distinct key had only 1,000 rows. The index would be useful for finding the rows of that other (smaller) distinct key. Oracle gathers histograms as part of its statistics to keep track of which columns have skewed data and, if so, how many rows there are for each distinct key. (Over-simplification).
TL;DR It's very likely a good index and no more likely to be "unneeded" than any other index in your system.

Index Decreases Number of Rows Read; No performance Gain

I created a non-clustered, non-unique index on a column (date) on a large table (16 million rows), but am getting very similar query speeds when compared to the exact same query that's being forced to not use any indexes.
Query 1 (uses index):
SELECT *
FROM testtable
WHERE date BETWEEN '01/01/2017' AND '03/01/2017'
ORDER BY date
Query 2 (no index):
SELECT *
FROM testtable WITH(INDEX(0))
WHERE date BETWEEN '01/01/2017' AND '03/01/2017'
ORDER BY date
Both queries take the same amount of time to run, and return the same result. When looking at the Execution plan for each, Query 1's number of rows read is
~ 4 million rows, where as Query 2 is reading 106 million rows. It appears that the index is working, but I'm not gaining any performance benefits from it.
Any ideas as to why this is, or how to increase my query speed in this case would be much appreciated.
Create Indexes with Included Columns: Cover index
This topic describes how to add included (or nonkey) columns to extend the functionality of nonclustered indexes in SQL Server by using SQL Server Management Studio or Transact-SQL. By including nonkey columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:
They can be data types not allowed as index key columns.
They are not considered by the Database Engine when calculating the
number of index key columns or index key size.
An index with nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.
CREATE NONCLUSTERED INDEX IX_your_index_name
ON testtable (date)
INCLUDE (col1,col2,col3);
GO
You need to build an index around the need of your query - this quick and free video course should bring you up to speed really quick.
https://www.brentozar.com/archive/2016/10/think-like-engine-class-now-free-open-source/

SQL Server multiple index order optimization

I have a table with a nonclustered index1 on ID1 and ID2, in that order.
Select count(distinct(id1)) from table
returns 1
and Select count(distinct(id2)) from table has all the values of the table.
The querys to that table uses ... where id1= XX and id2 = XX
Could it make any performance improvement if I switch the order of the fields of index1 ?
I know it SHOULD be better but maybe: is it indifferent because id1 has only 1 value?
If I understand correctly, you are comparing these two statements:
where id1= XX and id2 = XX
Under most circumstances, this would use either an index on table(id1, id2) or table(id2, id1). The order of the comparisons in the where (or on) clauses has no impact on which indexes can be used.
Whether you should include a column that has only a single value in the unique index is a different matter. There is a minor performance effect to having a more complex index -- the tree structure has to store more bytes for each key. However, the query:
select count(distinct id2)
from table
where id1 = xx and idx = xx
will actually run faster with a composite index than with a singleton index table(id2). The reason is that the composite index can be used to entirely satisfy the query (in the jargon, it is a "covering index for the query"). The singleton index would need to look up the value of id1 in the table data, which requires extra processing.
The order you define the columns in your Index matters. If your column ID1 will always only have 1 value, then there is no point in putting it into the index, unless you are using it in a Covering Index in a Non-Clustered Index (meaning an Index not the physical ordering of the Table itself). In general, your first column defined in your Index should be the column with the most Varying Values that you need to search through. Visualize it this way, if you had a table of 1 million rows, and the first Column in your Index only had 1 (or small number) of varying values, then would that Index help you in finding the rows you want among the 1 million? Or would it be better to have ID2 first, which would be more efficient for the search, and which would be more frequently used, is what you have to ask yourself. Below is also more info on your question.
SQL Server Clustered Index - Order of Index Question
If you are using a Non-Clustered index, it may appear to not make a Different if your first Column in your Index is all the same values. However it does matter, the reason being is a Non-Clustered Index is stored on a number of Pages. The more entries you can store on a Page which helps you search faster the better. If you include a Column on a Page which adds no value to the Search, then it will requires the same Index to span more Pages. Meaning more Pages to flip through and Longer Lookups. It also means less Room to add new entries to an Existing Page during Inserts when the index is updated, causing more Page Splits. So there are side effects to the decision to add a Column of only 1 value to the Index. If you are using the Column to "cover" retrieved values in common selects, then you can also use Included Columns in your Index, which has the added benefit of not reordering your Index and yet acts like a Covered Index. If that was the intended purpose originally for adding a Column which only has 1 value.

Index performance with WHERE clause in SQL

I'm reading about indexes in my database book and I was wondering if I was correct in my assumption that a WHERE clause with a non-constant expression in it will not use the index.
So if i have
SELECT * FROM statuses WHERE app_user_id % 10 = 0;
This would not use an index created on app_user_id. But
SELECT * FROM statuses WHERE app_user_id = 5;
would use the index on app_user_id.
Usually (there are other options) a database index is a B-Tree, which means that you can do range scans on it (including equality scans).
The condition app_user_id % 10 = 0 cannot be evaluated with a single range scan, which is why a database will probably not use an index.
It could still decide to use the index in another way, namely for a full scan: Reading the whole table takes more time than just reading the whole index. On the other hand, after reading the index you may still get back to the table, so the overall cost may end up being higher.
This is up to the database query optimizer to decide.
A few examples:
select app_user_id from t where app_user_id % 10 = 0
Here, you do not need the table at all, all necessary data is in the index. The database will most likely do a full index scan.
select count(*) from t where app_user_id % 10 = 0
Same. Full index scan.
select count(*) from t
Only if app_user_id is NOT NULL can this be done with the index (because NULL data is not in the index, at least on Oracle, at least on single column indexes, your database may handle this differently).
Some databases do not need to do access table or index for this, they maintain row counts in the metadata.
select * from t where app_user_id = 5
This is the classic scenario for an index. The database can look at the small section of the index tree, retrieve a small (just one if this was a unique or primary index) number of rowids and fetch those selectively from the table.
select * from t where app_user_id between 5 and 10
Another classic index case. Range scan in the tree returns a small number of rowids to fetch from the table.
select * from t where app_user_id between 5 and 10 order by app_user_id
Since index scans return ordered data, you even get the sorting for free.
select * from t where app_user_id between 5 and 1000000000
Maybe here you should not be using an index. It seems to match too many records. This is a case where having bind variables hide the range from the database could actually be detrimental.
select * from t where app_user_id between 5 and 1000000000
order by app_user_id
But here, since sorting would be very expensive (even taking up temporary swap disk space), maybe iterating in index order is good. Maybe.
select * from t where app_user_id % 10 = 0
This is difficult to decide. We need all columns, so ultimately the query needs to touch the table. The question is whether to go through an index first. The query returns approximately 10% of the whole table. That is probably too much for an index access path to be efficient. If the optimizer has reason to believe that the query returns much less than 10% of the table, an index scan followed by accessing the table might be good. Same if the table is very fragmented (lots of deleted rows eating up space).

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/