How do I make the database perform an index scan? - sql

Simple question I think. I want to do an index scan on a table but it's not doing it. So I have a table with a unique clustered index on ID column and have 2 other columns, first_name and last_name. The following was my query...
SELECT FIRST_NAME
FROM TABLE_A
WHERE FIRST_NAME LIKE 'GUY'
I thought since I wasn't searching on the column with the index it should do it.
Why isn't it working and how do I make sure that I can get this to work every time I want it to?

Since first_name is not part of any index, there's no point in the database using an index - it will have to scan all of it, access the actual table row for each entry, and evaluate the first_name value there. Since it's accessing all the table's rows anyway, the optimizer just prefers to perform a full table scan, and save the (useless) index accesses.
If you want to use an index to speed up your query, you should create one that covers this column. E.g.:
CREATE INDEX table_a_first_name_ind ON table_a(first_name)

Related

Why table scan when filtering on index?

Question: Why is the execution plan showing a table scan when filtering on an index?
Database Engine: SQL Server 2019
Table Row Count: 4,000
I have read many articles on how non-clustered indexes are used in queries and dug through a couple of posts here, yet the behavior I see in the execution plan is not consistent with the theories.
Here's the simple table (a heap, I know, it should have a clustered index and normally it would, this is just to demonstrate the issue)
Create Table People (LastName NVarchar(50), FirstName NVarchar(50))
Create NONCLUSTERED INDEX Idx_LastName ON People (LastName)
The table is populated with 4,000 rows of data, no nulls.
If I execute the following query:
SELECT LastName FROM People WHERE LastName = 'Smith'
I get what I would expect, which is the Index Seek on Idx_LastName.
However, if I execute:
SELECT LastName, FirstName FROM People WHERE LastName = 'Smith'
The execution plan shows a Table Scan. I infer that this is a "full table scan."
Why do I get a Table Scan when I am still only filtering on an indexed column and am only selecting values from the non-indexed column that specifically match the filtered subset of data - in this case lastnames matching 'smith'?
I understand that the query engine needs to scan the full set of firstnames that are associated with the lastname'Smith', but that pool of names should already be filtered because it's only all the first names associated with 'Smith' - which small subset of the total number of first names in the table.
So, why a full table scan?
Maybe it's not really a full table scan, but a partial table scan? But, if that's the case, I have a hard time proving it. Also, possibly 4,000 records is not enough to force the query engine to do an index seek, but I think it should be since just selecting on LastName results in a seek.

Index is not getting used

This is excerpt from Tom Kyte's book.
"We’re using a SELECT COUNT(*) FROM T query (or something similar)
and we have a B*Tree index on table T. However, the optimizer is full
scanning the table, rather than counting the (much smaller) index
entries. In this case, the index is probably on a set of columns that
can contain Nulls. Since a totally Null index entry would never be
made, the count of rows in the index will not be the count of rows in
the table. Here the optimizer is doing the right thing—it would get
the wrong answer if it used the index to count rows."
As far as I know indexes come into picture when we use a WHERE clause. Why index come in the above scenario? Before countering him I wanted to know the facts.
"As far as i know indexes comes in picture when you used where clause. "
That's one use case for indexes, when we want quick access to rows identified by specific values of indexed column(s). But there are other uses.
Counting rows is one. To count the number of rows in a table Oracle actually has to count each row (because statistics may not be fresh enough), which means literally reading each block of storage and counting the rows in each block. Potentially that's a lot of reads.
However, an index on a NOT NULL column also has an entry for each row of the table. Indexes are much smaller than tables (typically only one column) so an Index block contains many more entries than a Table block. Consequently Oracle has to read far fewer Index blocks to get the count of rows than scanning the table would require. Reading fewer blocks is faster than reading more blocks.
This isn't true if the table only has indexes on nullable columns. Oracle doesn't index null values (unless the index is a composite index and at least one column is populated) so a count of the entries in an index couldn't guarantee to be the actual count of the table's rows.
Another common use case for reading indexes is to satisfy a SELECT statement where all the columns in a projection are in one index, and the index also services any WHERE conditions.
Oracle Database does not store NULLs in the B-tree index, see the documentation
Oracle Database does not index table rows in which all key columns are
null, except for bitmap indexes or when the cluster key column value
is null.
Because of this, if the index has been created on a column that may contain null values, the database cannot use this index in a query like: SELECT COUNT(*) FROM T. Even when the column does not contain any NULLs, the optimizer doesn't know this unless the column has ben marked as NOT NULL.
According to the documentation - FAST FULL INDEX SCAN
Fast Full Index Scan
A fast full index scan is a full index scan in
which the database accesses the data in the index itself without
accessing the table, and the database reads the index blocks in no
particular order.
Fast full index scans are an alternative to a full table scan when
both of the following conditions are met:
The index must contain all columns needed for the query.
A row containing all nulls must not appear in the query result set.
For this result to be guaranteed, at least one column in the index
must have either:
A NOT NULL constraint
A predicate applied to the column that prevents nulls from being
considered in the query result set
So if you know that the indexed column cannot contain NULL values, then mark this column as NOT NULL using ALTER TABLE table_name MODIFY column_name column_type NOT NULL; and the database will use that index in the query: SELECT COUNT(*) FROM T
If the colum can have nulls, and cannot be marked as NOT NULL, then use a solution from #Gordon Linoff's answer.
You can force the indexing of NULL values by including a constant in the index:
create index t_table_col on t(col, 0);
The 1 is a constant expression that is never NULL.

In a nonclustered index, how are the second, third, fourth ... columns sorted?

I have this question about SQL Server indexes that has been bugging me of late.
Imagine a table like this:
CREATE TABLE TelephoneBook (
FirstName nvarchar(50),
LastName nvarchar(50),
PhoneNumber nvarchar(50)
)
with an index like this:
CREATE NONCLUSTERED INDEX IX_LastName ON TelephoneBook (
LastName,
FirstName,
PhoneNumber
)
and imagine that this table has hundreds of thousands of rows.
Let's say I want to select everyone whose last name starts with a B and the firstname is 'John'. I would write the following query:
SELECT
*
FROM TelephoneBook
WHERE LastName like 'B%'
AND FirstName='John'
Since the index can help to reduce the number of rows we need to scan because it groups all of the LastNames that start with a B anyway, does it also do this for the FirstName? Or does the database scan every row that starts with a B to find the ones with the first name 'John'?
In other words, how are the second, third, fourth, ... columns sorted in an index? Are they alphabetical in this case as well, so it's pretty easy to find Johanna? Or are they in some sort of a random or different order?
EDIT: why I ask, is because I have just read that in the above SELECT statement, the index will only be used to narrow down the search to the records where the lastname starts with a B, but that the index will NOT be used to find all of the rows with Johanna in it (and will resort to scanning all of the 'B' rows). And I'm wondering why that is? What am I not getting?
As a convenient shorthand, the keys of an index are used for the where clause up to the first inequality. like with a wildcard is considered an inequality.
So, the index will only be used for looking up the first value. However, the entries will probably be scanned to match on the first name, so you will still get index usage.
Of course, the optimizer may decide not to use the index at all, if it decides that a full-table scan is more appropriate.
Gordon's answer is correct in this instance with the specified query. In general, you should be aware that it's not so much grouping records together in "buckets" based on the values of the columns, but rather ordering them according to the index's key columns. In other words, your records in this index will be ordered according to LastName, and for records that share the same LastName value they will be further ordered by FirstName value, and then by PhoneNumber value. You didn't specify a sort order for your columns on this index, but SQL Server defaults unspecified sort orders to ASC(ending), so those columns are indeed lexically sorted in the index .
In your particular case, the query optimizer has decided to look at the index for the first column to determine which records to grab, as Gordon's answer mentions, but SQL Server will reorder predicates if the optimizer decides that would be better, and may use more columns of the index or none at all, depending on the query itself and statistics on the records you are querying.
Logically speaking, the index is sorted by key values in the order of the key. So in this case, LastName (sorted as text), FirstName (sorded as text) and then PhoneNumber (sorted as text)... Any included columns are not sorted at all.
In your case, we know that trailing wildcards are still SARGable, so we'd expect to see an index seek narrowing the data down to all data w/ LastNames starting w/ "B", from that data pool, it will be further filtered to include only those rows that have FirstName = 'John'. You can think of it as an index seek followed by a range seek.

SQL Server index included columns

I need help understanding how to create indexes. I have a table that looks like this
Id
Name
Age
Location
Education,
PhoneNumber
My query looks like this:
SELECT *
FROM table1
WHERE name = 'sam'
What's the correct way to create an index for this with included columns?
What if the query has a order by statement?
SELECT *
FROM table1
WHERE name = 'sam'
ORDER BY id DESC
What if I have 2 parameters in my where statement?
SELECT *
FROM table1
WHERE name = 'sam'
AND age > 12
The correct way to create an index with included columns? Either via Management Studio/Toad/etc, or SQL (documentation):
CREATE INDEX idx_table_1 ON db.table_1 (name) INCLUDE (id)
What if the Query has an ORDER BY
The ORDER BY can use indexes, if the optimizer sees fit to (determined by table statistics & query). It's up to you to test if a composite index or an index with INCLUDE columns works best by reviewing the query cost.
If id is the clustered key (not always the primary key though), I probably wouldn't INCLUDE the column...
What if I have 2 parameters in my where statement?
Same as above - you need to test what works best for your query. Might be composite, or include, or separate indexes.
But keep in mind that:
tweaking for one query won't necessarily benefit every other query
indexes do slow down INSERT/UPDATE/DELETE statements, and require maintenance
You can use the Database Tuning Advisor (DTA) for index recommendations, including when some are redundant
Recommended reading
I highly recommend reading Kimberly Tripp's "The Tipping Point" for a better understanding of index decisions and impacts.
Since I do not know which exactly tasks your DB is going to implement and how many records in it, I would suggest that you take a look at the Index Basics MSDN article. It will allow you to decide yourself which indexes to create.
If ID is your primary and/or clustered index key, just create an index on Name, Age. This will cover all three queries.
Included fields are best used to retrieve row-level values for columns that are not in the filter list, or to retrieve aggregate values where the sorted field is in the GROUP BY clause.
If inserts are rare, create as much indexes as You want.
For first query create index for name column.
Id column I think already is primary key...
Create 2nd index with name and age. You can keep only one index: 'name, ag'e and it will not be much slower for 1st query.

Seek & Scan in SQL Server

After googling i came to know that Index seek is better than scan.
How can I write the query that will yield to seek instead of scan. I am trying to find this in google but as of now no luck.
Any simple example with explanation will be appreciated.
Thanks
Search by the primary key column(s)
Search by column(s) with index(es) on them
An index is a data structure that improves the speed of data retrieval operations on a database table. Most dbs automatically create an index when a primary key is defined for a table. SQL Server creates an index for primary key (composite or otherwise) as a "clustered index", but it doesn't have to be the primary key - it can be other columns.
NOTE:
LIKE '%'+ criteria +'%' will not use an index; LIKE criteria +'%' will
Related reading:
SQL SERVER – Index Seek vs. Index Scan
Index
Which is better: Bookmark/Key Lookup or Index Scan
Extending rexem's feedback:
The clustered index idea for pkeys isn't arbitrary. It's simply a default to make the pkey clustered. And clustered means that values will be physically placed near each other on a Sql Server 8k page thus assuming that if you fetch one value by pkey, you will probably be interested in its neighbors. i don't think it's a good idea to do that for pkeys since they're usually unique but arbitrary identifiers. Better to cluster on more useful data. One clustered index per table btw.
In a nutshell: If you can filter your query on a clustered index column (that makes sense) then all the better.
An index seek is when SQL Server can use a binary search to quickly find the row. The rows in an index are sorted in a particular order, and your query has to specify enough information in the WHERE clause to allow SQL Server to make use of the sorted index.
An index scan is when SQL Server cannot use the sort order of the index, but can still use the index itself. This makes sense if the table rows are very large, but the index is relatively small. SQL Server will only have to read the smaller index from disk.
As a simple example, take a phonebook table:
id int identity primary key
lastname varchar(50)
phonenumber varchar(15)
Say that there is an index on (lastname). Then this query will result in an index seek:
select * from phonebook where lastname = 'leno'
This query will result in an index scan:
select * from phonebook where lastname like '%no'
The analogy with a real life phonebook is that you can't look up people whose name ends in 'no'. You have to browse the entire phonebook.