Database covering a query [duplicate] - sql

This question already has answers here:
What are Covering Indexes and Covered Queries in SQL Server?
(9 answers)
Closed 8 years ago.
Trying to understand what "covering a query means" with a specific example
If I have a table with say 3 columns:
Col1 Col2 Col3
And I put an index on Col1 and Col2
is "covering a query" determine by the columns selected in the SELECT or the columns in the the WHERE Clause?
Thus :
1) select Col1, Col2 from MyTable where Col3=XXX
2) Select Col3 from MyTable where Col1=xxx and Col2=yyy
3) Select Col1, Col2 from MyTable where Col1=xxx and Col2=yyy
Which of these three are truly "Covered"?

Only the third example is covered. To be covered, a query must be fully satisfied from the index. Your first example produces results that are entirely within the index, but it needs information that is not part of the index to complete, and so is not covered. To match your first example, you need an index that lists Col3 first.
One important feature of indexes is the ability to include a set of column in the index without actually indexing those columns. So an index example for your table might look like this:
CREATE INDEX [ix_MyTable] ON [MyTable]
(
[Col1] ASC,
[Col2] ASC
)
INCLUDE ( [Col3])
Now samples 2 and 3 are both covered. Sample 1 is still not covered, because the index is still not useful for the WHERE clause.
Why INCLUDE Col3, rather than just listing it with the others? It's important to remember that as you add indexes or make them more complex, operations that change data using those indexes will require more and more work, because each change will also require updating the indexes. If you include a column in an index, without actually indexing it, an update to that column still needs to go back and update the index as well, so that the data in the index is accurate... but it doesn't also need to re-order the index based on the new value. So this saves some work for our database server. To put it another way, if a column will only be in the select list, and not in the where clause, you might get a small performance benefit by including it in an index to get the benefit of covering a query from the index, without actually indexing on the column.

It is not just the where clause and select clause. A group by clause also needs its columns to be covered by the index for it to be a covering index. Basically, to be a covering index, it needs to contain all the column used in the query for a given table. However, if you don't include them in the right order, the index won't be used.
If the column order in the index is (col1, col2, col3), then the index can't be used for query one since you are selecting by col3. Think of it like a phone book sorted by last name, then first name, then middle initial. Finding everyone with a last name Smith is easy, finding everyone with the first name John isn't helped by the sorting, you have to read the whole phone book. Same for the index. Finding a col1 value is easy. Finding a col1 value and then col2 values is fine. Just finding col3 or just col2 is not helped by the index.

Related

Should the column used to order results be included in the index of a postgresql table?

I am creating indexes for a PostgreSQL database. I would like to know whether the column(s) used to order results in a PostgreSQL statement should be included in the index.
Assume I have created a table with label 'table1' in a PostgreSQL database with columns labelled 'col1', 'col2' and 'col3'.
I would like to execute the following query:
SELECT * FROM table1 WHERE col1 = 'word1' AND col2 = 'word2' ORDER BY col3;
I know that an index for this search should include all columns referenced in the WHERE clause so, in this case, the index would include col1 and col2.
Should the index also include col3?
Because you have equality comparisons, Postgres should be able to use an index on (col1, col2, col3).
The first two columns are used for the where clause; the last for the order by.
Note that this is very specifically for your query. And it assumes that the collations on the strings are compatible and there is no type conversion. Also, the comparisons in the where need to be equality comparisons for the index to be used for the order by.
I believe that the direction of the order by also has to match the direction defined in the index.
I have found that the MySQL documentation on multi-column indexes is a good introduction to the topic. It focuses on the where clause, but it gives a good flavor of when indexes can and cannot be used -- and the rules tend to be similar across databases.

Query takes too much time when added new column to table and set index on it

I have added a new column to table which contains lakhs of records. and created composit index with three column (one newly added + two existing column)
for exampl. in table TBL there are two columns say col1 , col2
I have added new column col3 to TBL and created composit index (col3, col1, col2).
Now for all records col3's value is NULL. when I select on this table, It takes too much time..
Any idea, what ma I doing wrong., I check query plan it is using index
Using index is quite expensive when the table has small number of rows or there is too much same values in the index.
Check the query plan for costs, not using index.
Also it seems that you are adding new row on the run (the nulls) which might suggest that your schema is denormalized.
It is resolved using Statistics Gathering by using
DBMS_STATS.GATHER_TABLE_STATS
Thanks #Jan for your view..

What's the difference between these T-SQL queries (one uses INCLUDE)? [duplicate]

While studying for the 70-433 exam I noticed you can create a covering index in one of the following two ways.
CREATE INDEX idx1 ON MyTable (Col1, Col2, Col3)
-- OR --
CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)
The INCLUDE clause is new to me. Why would you use it and what guidelines would you suggest in determining whether to create a covering index with or without the INCLUDE clause?
If the column is not in the WHERE/JOIN/GROUP BY/ORDER BY, but only in the column list in the SELECT clause is where you use INCLUDE.
The INCLUDE clause adds the data at the lowest/leaf level, rather than in the index tree.
This makes the index smaller because it's not part of the tree
INCLUDE columns are not key columns in the index, so they are not ordered.
This means it isn't really useful for predicates, sorting etc as I mentioned above. However, it may be useful if you have a residual lookup in a few rows from the key column(s)
Another MSDN article with a worked example
You would use the INCLUDE to add one or more columns to the leaf level of a non-clustered index, if by doing so, you can "cover" your queries.
Imagine you need to query for an employee's ID, department ID, and lastname.
SELECT EmployeeID, DepartmentID, LastName
FROM Employee
WHERE DepartmentID = 5
If you happen to have a non-clustered index on (EmployeeID, DepartmentID), once you find the employees for a given department, you now have to do "bookmark lookup" to get the actual full employee record, just to get the lastname column. That can get pretty expensive in terms of performance, if you find a lot of employees.
If you had included that lastname in your index:
CREATE NONCLUSTERED INDEX NC_EmpDep
ON Employee(DepartmentID)
INCLUDE (Lastname, EmployeeID)
then all the information you need is available in the leaf level of the non-clustered index. Just by seeking in the non-clustered index and finding your employees for a given department, you have all the necessary information, and the bookmark lookup for each employee found in the index is no longer necessary --> you save a lot of time.
Obviously, you cannot include every column in every non-clustered index - but if you do have queries which are missing just one or two columns to be "covered" (and that get used a lot), it can be very helpful to INCLUDE those into a suitable non-clustered index.
This discussion is missing out on the important point: The question is not if the "non-key-columns" are better to include as index-columns or as included-columns.
The question is how expensive it is to use the include-mechanism to include columns that are not really needed in index? (typically not part of where-clauses, but often included in selects). So your dilemma is always:
Use index on id1, id2 ... idN alone or
Use index on id1, id2 ... idN plus include col1, col2 ... colN
Where:
id1, id2 ... idN are columns often used in restrictions and col1, col2 ... colN are columns often selected, but typically not used in restrictions
(The option to include all of these columns as part of the index-key is just always silly (unless they are also used in restrictions) - cause it would always be more expensive to maintain since the index must be updated and sorted even when the "keys" have not changed).
So use option 1 or 2?
Answer: If your table is rarely updated - mostly inserted into/deleted from - then it is relatively inexpensive to use the include-mechanism to include some "hot columns" (that are often used in selects - but not often used on restrictions) since inserts/deletes require the index to be updated/sorted anyway and thus little extra overhead is associated with storing off a few extra columns while already updating the index. The overhead is the extra memory and CPU used to store redundant info on the index.
If the columns you consider to add as included-columns are often updated (without the index-key-columns being updated) - or - if it is so many of them that the index becomes close to a copy of your table - use option 1 I'd suggest! Also if adding certain include-column(s) turns out to make no performance-difference - you might want to skip the idea of adding them:) Verify that they are useful!
The average number of rows per same values in keys (id1, id2 ... idN) can be of some importance as well.
Notice that if a column - that is added as an included-column of index - is used in the restriction: As long as the index as such can be used (based on restriction against index-key-columns) - then SQL Server is matching the column-restriction against the index (leaf-node-values) instead of going the expensive way around the table itself.
Basic index columns are sorted, but included columns are not sorted. This saves resources in maintaining the index, while still making it possible to provide the data in the included columns to cover a query. So, if you want to cover queries, you can put the search criteria to locate rows into the sorted columns of the index, but then "include" additional, unsorted columns with non-search data. It definitely helps with reducing the amount of sorting and fragmentation in index maintenance.
One reason to prefer INCLUDE over key-columns if you don't need that column in the key is documentation. That makes evolving indexes much more easy in the future.
Considering your example:
CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)
That index is best if your query looks like this:
SELECT col2, col3
FROM MyTable
WHERE col1 = ...
Of course you should not put columns in INCLUDE if you can get an additional benefit from having them in the key part. Both of the following queries would actually prefer the col2 column in the key of the index.
SELECT col2, col3
FROM MyTable
WHERE col1 = ...
AND col2 = ...
SELECT TOP 1 col2, col3
FROM MyTable
WHERE col1 = ...
ORDER BY col2
Let's assume this is not the case and we have col2 in the INCLUDE clause because there is just no benefit of having it in the tree part of the index.
Fast forward some years.
You need to tune this query:
SELECT TOP 1 col2
FROM MyTable
WHERE col1 = ...
ORDER BY another_col
To optimize that query, the following index would be great:
CREATE INDEX idx1 ON MyTable (Col1, another_col) INCLUDE (Col2)
If you check what indexes you have on that table already, your previous index might still be there:
CREATE INDEX idx1 ON MyTable (Col1) INCLUDE (Col2, Col3)
Now you know that Col2 and Col3 are not part of the index tree and are thus not used to narrow the read index range nor for ordering the rows. Is is rather safe to add another_column to the end of the key-part of the index (after col1). There is little risk to break anything:
DROP INDEX idx1 ON MyTable;
CREATE INDEX idx1 ON MyTable (Col1, another_col) INCLUDE (Col2, Col3);
That index will become bigger, which still has some risks, but it is generally better to extend existing indexes compared to introducing new ones.
If you would have an index without INCLUDE, you could not know what queries you would break by adding another_col right after Col1.
CREATE INDEX idx1 ON MyTable (Col1, Col2, Col3)
What happens if you add another_col between Col1 and Col2? Will other queries suffer?
There are other "benefits" of INCLUDE vs. key columns if you add those columns just to avoid fetching them from the table. However, I consider the documentation aspect the most important one.
To answer your question:
what guidelines would you suggest in determining whether to create a covering index with or without the INCLUDE clause?
If you add a column to the index for the sole purpose to have that column available in the index without visiting the table, put it into the INCLUDE clause.
If adding the column to the index key brings additional benefits (e.g. for order by or because it can narrow the read index range) add it to the key.
You can read a longer discussion about this here:
https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes
The reasons why (including the data in the leaf level of the index) have been nicely explained. The reason that you give two shakes about this, is that when you run your query, if you don't have the additional columns included (new feature in SQL 2005) the SQL Server has to go to the clustered index to get the additional columns which takes more time, and adds more load to the SQL Server service, the disks, and the memory (buffer cache to be specific) as new data pages are loaded into memory, potentially pushing other more often needed data out of the buffer cache.
An additional consideraion that I have not seen in the answers already given, is that included columns can be of data types that are not allowed as index key columns, such as varchar(max).
This allows you to include such columns in a covering index. I recently had to do this to provide a nHibernate generated query, which had a lot of columns in the SELECT, with a useful index.
There is a limit to the total size of all columns inlined into the index definition. That said though, I have never had to create index that wide.
To me, the bigger advantage is the fact that you can cover more queries with one index that has included columns as they don't have to be defined in any particular order. Think about is as an index within the index.
One example would be the StoreID (where StoreID is low selectivity meaning that each store is associated with a lot of customers) and then customer demographics data (LastName, FirstName, DOB):
If you just inline those columns in this order (StoreID, LastName, FirstName, DOB), you can only efficiently search for customers for which you know StoreID and LastName.
On the other hand, defining the index on StoreID and including LastName, FirstName, DOB columns would let you in essence do two seeks- index predicate on StoreID and then seek predicate on any of the included columns. This would let you cover all possible search permutationsas as long as it starts with StoreID.

Cost of adding an index when another already exists in the same order

Working with DB2 10 on z/os. My question is if adding an index on a column would have the normal cost of adding an index if there is already another index (non-clustered) on a col concatenation of my column and other column(s), e.g. want to add an index to col1 when an index exists on col4, which is a concatenation of col1 and col2.
In case your curious about the situation. We created some tables when converting from another database, and the keys were on combined fields. To mimic the old keys (and so not rewrite our whole system), but have these fields split out so they are useful, we have tables with all the old individual columns and some new columns for the keys which are created by triggers (on insert) by concatenating some columns, n.b. when they are not equal to spaces, and these new columns are indexed.
So, for example a table has col1 (char), col2 (char), and col3 and creates indexed col4 as a concatenation of col1 and col2 on insert.
This was done so col4 would match our old database, e.g. doesn't exist if col1 or col2 are blank.
Well, the downside was that this was done as a blanket rule and on some tables col1 and col2 are never blank. So an index on col1, col2 would be the same...and is actually preferred because sometimes we only want to search by col1 (not w/ col2)...and especially use it in joins to other tables.
So...in that case, does db2 gain any advantage from the non-clustered index that is pretty much the same thing?
As far as DB2 is concerned, they're complete separate columns (which they are), and the costs of adding an index (updating the index for INSERTs, UPDATEs, and DELETEs) cannot be "short-circuited" just because you have an index on another column that just happens to be the result of a concatenation of two other columns.
If you still query on col4, I would leave an index on that. I would then add a new index on (col1, col2).

index with multiple columns - ok when doing query on only one column?

If I have an table
create table sv ( id integer, data text )
and an index:
create index myindex_idx on sv (id,text)
would this still be usefull if I did a query
select * from sv where id = 10
My reason for asking is that i'm looking through a set of tables with out any indexes, and seeing different combinations of select queries. Some uses just one column other has more than one. Do I need to have indexes for both sets or is an all-inclusive-index ok?
I am adding the indexes for faster lookups than full table scans.
Example (based on the answer by Matt Huggins):
select * from table where col1 = 10
select * from table where col1 = 10 and col2=12
select * from table where col1 = 10 and col2=12 and col3 = 16
could all be covered by index table (co1l1,col2,col3) but
select * from table where col2=12
would need another index?
It should be useful since an index on (id, text) first indexes by id, then text respectively.
If you query by id, this index will be used.
If you query by id & text, this index will be used.
If you query by text, this index will NOT be used.
Edit: when I say it's "useful", I mean it's useful in terms of query speed/optimization. As Sune Rievers pointed out, it will not mean you will get a unique record given just ID (unless you specify ID as unique in your table definition).
Oracle supports a number of ways of using an index, and you ought to start by understanding all of them so have a quick read here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#sthref973
Your query select * from table where col2=12 could usefully leverage an index skip scan if the leading column is of very low cardinality, or a fast full index scan if it is not. These would probably be fine for running reports, however for an OLTP query it is likely that you would do better to create an index with col2 as the leading column.
I assume id is primary key. There is no point in adding a primary key to the index, as this will always be unique. Adding something unique to something else will also be unique.
Add a unique index to text, if you really need it, otherwise just use id is uniqueness for the table.
If id is not your primary key, then you will not be guaranteed to get a unique result from your query.
Regarding your last example with lookup on col2, I think you could need another index. Indexes are not a cure-all solution for performance problems though, sometimes your database design or your queries needs to be optimized, for instance rewritten into stored procedures (while I'm not totally sure Oracle has them, I'm sure there's an Oracle equivalent).
If the driver behind your question is that you have a table with several columns and any combination of these columns may be used in a query, then you should look at BITMAP indexes.
Looking at your example:
select * from mytable where col1 = 10 and col2=12 and col3 = 16
You could create 3 bitmap indexes:
create bitmap index ix_mytable_col1 on mytable(col1);
create bitmap index ix_mytable_col2 on mytable(col2);
create bitmap index ix_mytable_col3 on mytable(col3);
These bitmap indexes have the great benefit that they can be combined as required.
So, each of the following queries would use one or more of the indexes:
select * from mytable where col1 = 10;
select * from mytable where col2 = 10 and col3 = 16;
select * from mytable where col3 = 16;
So, bitmap indexes may be an option for you. However, as David Aldridge pointed out, depending on your particular data set a single index on (col1,col2,col3) might be preferable. As ever, it depends. Take a look at your data, the likely queries against that data, and make sure your statistics are up to date.
Hope this helps.