During an interview I was asked a weird question. I could not find the correct answer for it, so posting the question below:
I have an index on a column Stud_Name. I am searching for name using a wild card. My query is
a) select * from Stud_Details where Stud_Name like 'A%'
b) select * from Stud_Details where Stud_Name like '%A'.
c) select * from Stud_Details where Stud_Name not like 'A%'
In which case would the SQL server use the Index, that I have created on Stud_Name?
PS: If this question seems idiotic don't get mad on me, get mad on the interviewer who asked this to me.
Also I don't have any info regarding how the index was created. This info above is all I have.
In what cases can SQL Server use an Index on Stud_Name?
Option (a) is the only one that can be used in an index seek. like 'A%' can get converted to a range seek on >= A and <B
Option (b) can't use an index seek as the leading wildcard prevents this. It could still scan an index though.
Option (c) could in theory be converted to two range seeks (< 'A' OR >= 'B' but I've just checked and SQL Server does not do that (even in cases where this would eliminate 100% of the table and with a FORCESEEK hint). Again it can scan an index though.
In what cases will SQL Server use an Index on Stud_Name?
This depends on cardinality estimates and whether the index is covering or not and the relative width of the index rows vs the base table rows.
Assuming the index is not covering then any rows found that match the WHERE clause will need lookups to retrieve the column values. The greater the number of estimated lookups the less likely the non covering index is to be used.
For b+c the choice is index scan + lookups vs table scan with no lookups. The favourability of doing an index scan will be higher if the index is much narrower than the table. If they are similar sizes there is not much IO benefit from reading the index rather than the table in the first place.
Related
I have some questions about index.
First, if I use a index column in WITH clause, dose this column still works as index column in main query?
For example,
WITH TEST AS (
SELECT EMP_ID
FROM EMP_MASTER
)
SELECT *
FROM TEST
WHERE EMP_ID >= '2000'
'EMP_ID' in 'EMP_MASTER' table is PK and index for EMP_MASTER consists of EMP_ID.
In this situation, Does 'Index Scan' happen in main query?
Second, if I join two tables and then use two index columns from each table in WHERE, does 'Index Scan' happen?
For example,
SELECT *
FROM A, B
WHERE A.COL1 = B.COL1 AND
A.COL1 > 200 AND
B.COL1 > 100
Index for table A consists of 'COL1' and index for table B consists of 'COL1'.
In this situation, does 'Index Scan' happen in each table before table join?
If you give me some proper advice, I really appreciate that.
First, SQL is a declarative language, not a procedural language. That is, a SQL query describes the result set, not the specific processing. The SQL engine uses the optimizer to determine the best execution plan to generate the result set.
Second, Oracle has a reasonable optimizer.
Hence, Oracle will probably use the indexes in these situations. However, you should look at the execution plan to see what Oracle really does.
First, if I use a index column in WITH clause, dose this column still works as index column in main query?
Yes. A CTE (the WITH part) is a query just like any other - and if a query references a physical table column used by an index then the engine will use the index if it thinks it's a good idea.
In this situation, Does 'Index Scan' happen in main query?
We can't tell from the limitated information you've provided. An engine will scan or seek an index based on its heuristics about the distribution of data in the index (e.g. STATISTICS objects) and other information it has, such as cached query execution plans.
In this situation, does 'Index Scan' happen in each table before table join?
As it's a range query, it probably would make sense for an engine to use an index scan rather than an index seek - but it also could do a table-scan and ignore the index if the index isn't selective and specific enough. Also factor in query flags to force reading non-committed data (e.g. for performance and to avoid locking).
I have two different queries in SQL Server and I want to clarify
how the execution plan would be different, and
which of them is more efficient
Queries:
SELECT *
FROM table_name
WHERE column < 2
and
SELECT column
FROM table_name
WHERE column < 2
I have a non-clustered index on column.
I used to use Postgresql and I am not familiar with SQL Server and these kind of indexes.
As I read many questions here I kept two notes:
When I have a non-clustered index, I need one more step in order to have access to data
With a non-clustered index I could have a copy of part of the table and I get a quicker response time.
So, I got confused.
One more question is that when I have "SELECT *" which is the influence of a non-clustered index?
1st query :
Depending on the size of the data you might face lookup issues such as Key lookup and RID lookups .
2nd query :
It will be faster because it will not fetch columns that are not part of the index , though i recommend using covering index ..
I recommend you check this blog post
The first select will use the non-clustered index to find the clustering key [clustered index exists] or page and slot [no clustered index]. Then that will be used to get the row. The query plan will be different depending on your STATS (the data).
The second query is "covered" by the non-clustered index. What that means is that the non-clustered index contains all of the data that you are selecting. The clustering key is not needed, and the clustered index and/or heap is not needed to provide data to the select list.
I have a table mytable of 5 million records and a query that looks like
select *
from mytable
where column1 = 'value1' and column2 = 'value2' and column3 = 'value3'
So I thought about creating an index based on the 3 columns but my problem is that I have no best column to put in the first position of the index because there is no column that is really discrimating compared to the others.
Therefore I would like to build something similar to the hash tables with a hash code based on these 3 columns. I tried a function-based index based on the concatenation of those 3 columns but it's taking so long to create that I never got it created and I believe it's the wrong way to achieve what I want. What is the correct way to achieve this ?
Just create an index with three columns:
create idx_mytable_col1_col2_col3 on mytable(col1, col2, col3)
You have equality comparisons. The order of the columns in the index does not matter in this case.
Let the database do the work for you.
ASE's indexes are generally stored as b-trees, and while there's some hashing 'magic' that takes place during index searching, there's still a bit of traversal/searching required; if the first column of an index is not very selective then you can see some degradation in index search performance when compared to an index where the more selective column(s) is listed first; the difference in performance is really going to depend on the selectivity of the column(s) in question and the sheer size of the index (ie, number of index levels and pages that have to be read/processed).
If you're running ASE 15.0.3+, and you're running ASE on linux, you may want to take a look at virtually-hashed tables. In a nutshell ... ASE stores the PK index as a hash instead of the normal b-tree, with the net result being that index search times are reduced. There are quite a few requirements/restrictions on virtually-hashed tables so I suggest you take a look at your Transact-SQL User's Guide for more details.
Obviously (?) there's a good bit more to table/index design than can be discussed here; certainly not something that can be addressed by looking at a single, generic query. ("Duh, Mark!" ?)
So, we have a table, InventoryListItems, that has several columns. Because we're going to be looking for rows at times based on a particlar column (g_list_id, a foreign key), we have that foreign key column placed into a non-clustered index we'll call MYINDEX.
So when I search for data like this:
-- fake data for example
DECLARE #ListId uniqueidentifier
SELECT #ListId = '7BCD0E9F-28D9-4F40-BD67-803005179B04'
SELECT *
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
I expected that it would use the MYINDEX index to find just the needed rows, and then look up the information in those rows. So not as good as just finding everything we need in the index itself, but still a big win over doing a full scan of the table.
But instead it seems that I'm still getting a clustered index scan. I can't figure out why that would happen.
If I do something like SELECTing only the values in the included columns of the index, it does what I would expect, an index seek, and just pulls everything from the index.
But if I SELECT *, why does it just bail on the index and do a scan when it seems like it would still benefit greatly from using it because it's referenced in the WHERE clause?
Since you're doing a SELECT * and thus you retrieve all columns, SQL Server's query optimizer may have decided it's easier and more efficient to just do a clustered index scan - since it needs to go to the clustered index leaf level to get all the columns anyway (and doing a seek first, and then a key lookup to actually get the whole data page, is quite an expensive operation - scan might just be more efficient in this setup).
I'm almost sure if you try
SELECT g_list_id
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
then there will be an index seek instead (since you're only retrieving a single column - not everything).
That's one of the reasons why I would recommend to be extra careful when using SELECT * .... - try to avoid it if ever possible.
We have a sql query as follows
select * from Table where date < '20091010'
However when we look at the query plan, we see
The type of query is SELECT.
FROM TABLE
Worktable1.
Nested iteration.
Table Scan.
Forward scan.
Positioning at start of table.
Using I/O Size 32 Kbytes for data pages.
With MRU Buffer Replacement Strategy for data pages.
which seems to suggest that a full table scan is done. Why is the index not used?
If the majority of your dates are found by applying < '20091010' then the index may well be overlooked in favour of a table scan. What is your distribution of dates within that table? What is the cardinality? Is the index used if you only select date rather than select *?
Unless the index is covering *, the optimizer realizes that a table scan is probably more efficient than an index seek/scan and bookmark lookup. What's the expected selectivity of the date range? Do you have a primary key defined?