How does SQL Server treat indexes on a table behind a view? - sql

So I'm trying to understand how SQL Server makes use of indexes on tables behind views. Here's the scenario: Table A has a composite clustered index on fields 1 & 2 and a nonclustered index on fields 3 & 4.
View A is written against Table A to filter out additional fields, but fields 1-4 are part of the view. So we write a query that joins the view to another table on the nonclustered index fields.
The resulting query plan hits Table A with a clustered index scan (instead of the expected nonclustered index seek). However, if we replace the view in the FROM clause with the table, the query plan then hits the nonclustered index and we get the index seek we expected.
Shouldn't the SQL engine make use of the index on the table the view is constructed on? Since it doesn't, why not?

When you're thinking of non-materialized views and optimizations -- think of them like this:
The engine is "cutting and pasting" the view text into every query you perform.
OK, that's not exactly 100% true, but it's probably the most helpful way to think of what to expect in terms of performance.
Views can be tricky, though. People tend to think that just because a column is in a view, that it means something significant when it comes to query performance. The truth is, if the query which uses your view doesn't include a set of columns, it can be "optimized away". So if you were to SELECT every column from your base tables in your view, and then you were to only select one or two columns when you actually use the view, the query will be optimized considering only those two columns you select.
Another consequence of this is that you can use views to very aggressively flatten out table structures. So let's say for example I have the following schema:
Widget
-------
ID (UNIQUE)
Name
Price
WidgetTypeID (FK to WidgetType.ID)
WidgetType
----------
ID (UNIQUE)
Name
vw_Widgets
----------
SELECT w.ID, w.Name, w.Price, w.WidgetTypeID, wt.Name AS TypeName
FROM Widgets w
LEFT JOIN WidgetType wt
ON wt.ID = w.WidgetTypeID;
Note the LEFT JOIN in the view definition. If you were to simply SELECT Name, Price FROM vw_Widgets, you'd notice that WidgetType wasn't even involved in the query plan! It's completely optimized away! This works with LEFT JOINS across unique columns because the optimizer knows that since WidgetType's ID is UNIQUE, it won't generate any duplicate rows from the join. And since there's a FK, you know that you can leave the join as a LEFT join because you'll always have a corresponding row.
So the moral of the story here with views is that the columns you select at the end of the day are the ones that matter, not the ones in the view. Views aren't optimized when they're created -- they're optimized when they're used.
Your question isn't really about views
Your question is actually more generic -- why can't you use the NC index? I can't tell you really because I can't see your schema or your specific query, but suffice it to say that at a certain point, the optimizer sees that the cost of looking up the additional fields outweighs what it would have cost to scan the table (because seeks are expensive) and ignores your nonclustered index.

Related

Unindexed views v/s Queries

I'm creating a view which contains subquery as specified witht he following SQL query on SQL Server 2012.
CREATE VIEW [dbo].[VIEW_Detail] WITH SCHEMABINDING
AS
SELECT a.ID, a.Name1, a.Name2,
STUFF
((SELECT CAST(',' AS varchar(max)) + t .Name1
FROM dbo.Synonyms AS s
INNER JOIN dbo.Details AS t ON s.SynonymTSN = t .TSN
WHERE s.oID= a.ID FOR XML PATH('')), 1, 1, '') AS Synonym
FROM a.Details
WHERE (a.Rank <= 100)
Since the definition contains a subquery I'm not able to create an Indexed view. Will it be faster to use a query instead of the view to retrieve data if my tables are indexed.Or will an unindexed view will still perform better than using a query. The view currently contains more than 50,000 rows. What other query optimizations can I use?
PS: I don't care about performance on insert/update
A view is simply a single SELECT statement saved using a name (i.e View Name), There is no performance benefit using view over an ad-hoc query.
Yes Indexed views can increase the performance but they come with a longgggggggg list of limitations. As they are materialized and Also other queries which are not calling this indexed view but can benefit from indexes defined on this view will make use of these indexes.
In your case you have a sub-query and also using FOR XML clause, they both are not allowed inside an indexed view.
To optimize you query you need to look at the execution plan first and see if query is doing table or Clustered Index scans. Try adding some indexes and try to get a seek instead of a scan.
Looking at this query I think if you have Indexes on TSN, ID and RANK columns of dbo.Details table and SynonymTSN column of dbo.Synonyms table, it can improve the performance of this query.
On a side note 50,000 rows isn't really a big number of rows as long as you have primary keys defined on these two tables, you should get a reasonable performance with is pretty simple query.

Getting RID Lookup instead of Table Scan?

SQL Fiddle: http://sqlfiddle.com/#!3/23cf8
In this query, when I have an In clause on an Id, and then also select other columns, the In is evaluated first, and then the Details column and other columns are pulled in via a RID Lookup:
--In production and in SQL Fiddle, Details is grabbed via a RID Lookup after the In clause is evaluated
SELECT [Id]
,[ForeignId]
,Details
--Generate a numbering(starting at 1)
--,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where foreignId In (1,2,3,5)
With this query, the Details are being pulled in via a Table Scan.
With NumberedContacts AS
(
SELECT [Id]
,[ForeignId]
--Generate a numbering(starting at 1)
,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where ForeignId In (1,2,3,5)
)
Select nc.[Id]
,nc.[ForeignId]
,sc.[Details]
From NumberedContacts nc
Inner Join SupportContacts sc on nc.Id = sc.Id
Where nc.ContactNumber <= 2 --Only grab the last 2 contacts per ForeignId
;
In SqlFiddle, the second query actually gets a RID Lookup, whereas in production with a million records it produces a Table Scan (the IN clause eliminates 99% of the rows)
Otherwise the query plan shown in SQL Fiddle is identical, the only difference being that for the second query the RID Lookup in SQL Fiddle, is a Table Scan in production :(
I would like to understand possibilities that would cause this behavior? What kinds of things would you look at to help determine the cause of it using a table scan here?
How can I influence it to use a RID Lookup there?
From looking at operation costs in the actual execution plan, I believe I can get the second query very close in performance to the first query if I can get it to use a RID Lookup. If I don't select the Detail column, then the performance of both queries is very close in production. It is only after adding other columns like Detail that performance degrades significantly for the second query. When I put it in SQL Fiddle and saw that the execution plan used an RID Lookup, I was surprised but slightly confused...
It doesn't have a clustered index because in testing with different clustered indexes, there was slightly worse performance for this and other queries. That was before I began adding other columns like Details though, and I can experiment with that more, but would like to have a understanding of what is going on now before I start shooting in the dark with random indexes.
What if you would change your main index to include the Details column?
If you use:
CREATE NONCLUSTERED INDEX [IX_SupportContacts_ForeignIdAsc_IdDesc]
ON SupportContacts ([ForeignId] ASC, [Id] DESC)
INCLUDE (Details);
then neither a RID lookup nor a table scan would be needed, since your query could be satisfied from just the index itself....
The differences in the query plans will be dependent on the types of indexes that exist and the statistics of the data for those tables in the different environments.
The optimiser uses the statistics (histograms of data frequency, mostly) and the available indexes to decide which execution plan is going to be the quickest.
So, for example, you have noticed that the performance degrades when the 'Details' column is included. This is an almost sure sign that either the 'Details' column is not part of an index, or if it is part of an index, the data in that column is mostly unique such that the index accesses would be equivalent (or almost equivalent) to a table scan.
Often when this situation arises, the optimiser will choose a table scan over the index access, as it can take advantage of things like block reads to access the table records faster than perhaps a fragmented read of an index.
To influence the path that will be chose by the optimiser, you would need to look at possible indexes that could be added/modified to make an index access more efficient, but this should be done with care as it can adversely affect other queries as well as possibly degrading insert performance.
The other important activity you can do to help the optimiser is to make sure the table statistics are kept up to date and refreshed at a frequency that is appropriate to the rate of change of the frequency distribution in the table data
If it's true that 99% of the rows would be omitted if it performed the query using the relevant index + RID then the likeliest problem in your production environment is that your statistics are out of date and the optimiser doesn't realise that ForeignID in (1,2,3,5) would limit the result set to 1% of the total data.
Here's a good link for discovering more about statistics from Pinal Dave: http://blog.sqlauthority.com/2010/01/25/sql-server-find-statistics-update-date-update-statistics/
As for forcing the optimiser to follow the correct path WITHOUT updating the statistics, you could use a table hint - if you know the index that your plan should be using which contains the ID and ForeignID columns then stick that in your query as a hint and force SQL optimiser to use the index:
http://msdn.microsoft.com/en-us/library/ms187373.aspx
FYI, if you want the best performance from your second query, use this index and avoid the headache you're experiencing altogether:
create index ix1 on SupportContacts(ForeignID, Id DESC) include (Details);

Decision when to create Index on table column in database?

I am not db guy. But I need to create tables and do CRUD operations on them. I get confused should I create the index on all columns by default
or not? Here is my understanding which I consider while creating index.
Index basically contains the memory location range ( starting memory location where first value is stored to end memory location where last value is
stored). So when we insert any value in table index for column needs to be updated as it has got one more value but update of column
value wont have any impact on index value. Right? So bottom line is when my column is used in join between two tables we should consider
creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of
updating index value when new value is inserted in column.Right?
Consider this scenario where table mytable contains two three columns i.e col1,col2,col3. Now we fire this query
select col1,col2 from mytable
Now there are two cases here. In first case we create the index on col1 and col2. In second case we don't create any index.** As per my understanding
case 1 will be faster than case2 because in case 1 we oracle can quickly find column memory location. So here I have not used any join columns but
still index is helping here. So should I consider creating index here or not?**
What if in the same scenario above if we fire
select * from mytable
instead of
select col1,col2 from mytable
Will index help here?
Don't create Indexes in every column! It will slow things down on insert/delete/update operations.
As a simple reminder, you can create an index in columns that are common in WHERE, ORDER BY and GROUP BY clauses. You may consider adding an index in colums that are used to relate other tables (through a JOIN, for example)
Example:
SELECT col1,col2,col3 FROM my_table WHERE col2=1
Here, creating an index on col2 would help this query a lot.
Also, consider index selectivity. Simply put, create index on values that has a "big domain", i.e. Ids, names, etc. Don't create them on Male/Female columns.
but update of column value wont have any impact on index value. Right?
No. Updating an indexed column will have an impact. The Oracle 11g performance manual states that:
UPDATE statements that modify indexed columns and INSERT and DELETE
statements that modify indexed tables take longer than if there were
no index. Such SQL statements must modify data in indexes and data in
tables. They also create additional undo and redo.
So bottom line is when my column is used in join between two tables we should consider creating index on column used in join but all other columns can be skipped because if we create index on them it will involve extra cost of updating index value when new value is inserted in column. Right?
Not just Inserts but any other Data Manipulation Language statement.
Consider this scenario . . . Will index help here?
With regards to this last paragraph, why not build some test cases with representative data volumes so that you prove or disprove your assumptions about which columns you should index?
In the specific scenario you give, there is no WHERE clause, so a table scan is going to be used or the index scan will be used, but you're only dropping one column, so the performance might not be that different. In the second scenario, the index shouldn't be used, since it isn't covering and there is no WHERE clause. If there were a WHERE clause, the index could allow the filtering to reduce the number of rows which need to be looked up to get the missing column.
Oracle has a number of different tables, including heap or index organized tables.
If an index is covering, it is more likely to be used, especially when selective. But note that an index organized table is not better than a covering index on a heap when there are constraints in the WHERE clause and far fewer columns in the covering index than in the base table.
Creating indexes with more columns than are actually used only helps if they are more likely to make the index covering, but adding all the columns would be similar to an index organized table. Note that Oracle does not have the equivalent of SQL Server's INCLUDE (COLUMN) which can be used to make indexes more covering (it's effectively making an additional clustered index of only a subset of the columns - useful if you want an index to be unique but also add some data which you don't want to be considered in the uniqueness but helps to make it covering for more queries)
You need to look at your plans and then determine if indexes will help things. And then look at the plans afterwards to see if they made a difference.

Unique index on two columns plus separate index on each one?

I don't know much about database optimization, but I'm trying to understand this case.
Say I have the following table:
cities
===========
state_id integer
name varchar(32)
slug varchar(32)
Now, say I want to perform queries like this:
SELECT * FROM cities WHERE state_id = 123 AND slug = 'some_city'
SELECT * FROM cities WHERE state_id = 123
If I want the "slug" for a city to be unique within its particular state, I'd add a unique index on state_id and slug.
Is that index enough? Or should I also add another on state_id so the second query is optimized? Or does the second query automatically use the unique index?
I'm working on PostgreSQL, but I feel this case is so simple that most DBMS work similarly.
Also, I know this surely doesn't make a difference on small tables, but my example is a simple one. Think of 200k+ rows tables.
Thanks!
A single unique index on (state_id, slug) should be sufficient. To be sure, of course, you'll need to run EXPLAIN and/or ANALYZE (perhaps with the help of something like http://explain.depesz.com/), but ultimately what indexes are appropriate depends very closely on what kind of queries you will be running. Remember, indexes make SELECTs faster and INSERTs, UPDATEs, and DELETEs slower, so you ideally want only as many indexes as are actually necessary.
Also, PostgreSQL has a smart query optimizer: it will use radically different search plans for queries on small tables and huge tables. If the table is small, it will just do a sequential scan and not even bother with any indexes, since the overhead of working with them is higher than just brute-force sifting through the table. This changes to a different plan once the table size passes a threshold, and may change again if the table gets larger again, or if you change your SELECT, or....
Summary: you can't trust the results of EXPLAIN and ANALYZE on datasets much smaller or different than your actual data. Make it work, then make it fast later (if you need to).
[EDIT: Misread the question... Hopefully my answer is more relevant now!]
In your case, I'd suggest 1 index on (state_id, slug). If you ever need to search just by slug, add an index on just that column. If you have those, then adding another index on state_id is unnecessary as the first index already covers it.
An index can be used whenever an initial segment of its columns are used in a WHERE clause. So e.g. an index on columns A, B and C will optimise queries containing WHERE clauses involving A, B and C, WHERE clauses with just A and B, or WHERE clauses with just A. Note that the order that columns appear in the index definition is very important -- this example index cannot be used for WHERE clauses involving just B and/or C.
(Of course it's up to the query optimiser whether or not a particular index actually gets used, but in your case with 200k rows, you can guarantee that a simple search by state_id or slug or both will use one of the indices.)
Any decent optimizer will see an index on three columns - say:
CREATE INDEX idx_1 ON SomeTable(Col1, Col2, Col3);
and will use that index for any of the following conditions:
WHERE Col1 = ...something...
WHERE Col1 = ...something... AND Col2 = ...otherthing...
WHERE Col3 = ....whatnot....
AND Col1 = ...something....
AND Col2 = ...otherthing...
That is, it will use the index if there are conditions applied to any contiguous leading subset of the columns of the index. Although I used equality, it can also apply to ranges (open - just greater than, for example) or closed (between two values).
To do optimization use EXPLAIN http://www.postgresql.org/docs/7.4/static/sql-explain.html and see for your self.
But optimization is not the most important reason to make those indexes; first it is a constraint inhibiting a database from not being logical.

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/