unique indexes and include statements - sql

create unique index
In DB2 UDB I can create an index using the following syntax
create unique index I_0004 on TABLENAME (a) INCLUDE (b, c, d);
where a, b, c and d are field of the table TABLENAME.
In DB2 for os390 this syntax (the INCLUDE keyword) is not allowed, so I am creating the indexes as follows
create unique index I_0004 on TABLENAME (a);
create index I_0005 on TABLENAME (a, b, c, d);
Are the two statements above equivalent to the solution with the INCLUDE keyword?
index columns order
And, if I slightly modify the first statement
create index I_0005 on TABLENAME (a, b, c, d) ALLOW REVERSE SCANS;
is this ALLOW REVERSE SCANS equivalent to creating indexes
create index I_0005 on TABLENAME (a, b, c, d);
create index I_0005 on TABLENAME (d, c, b, a);
or does it consider also any combination of the given columns (I mean, a,b,c,d; b,c,d,a; c,d,a,b; and so on...)?

Regarding the UNIQUE INDEX: roughly, yes, a unique index on (a) including (b, c, d) is equivalent to a unique index just on (a) plus a non-unique one on (a, b, c, d) -- except of course that, internally, the database engine may be able to use less space &c.
Regarding ALLOW REVERSE SCANS: no, an index on (a, b) that can be reverse-scanned is not equivalent to one that can't plus one on (b, a) -- rather, an index that can be reverse scanned is equivalent to one that can't plus another on the same columns where each ASC becomes a DESC and vice versa (and ASC is the default when you don't specify).
Note that since DB2 9.1 reverse scans are allowed by default, see http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.rn.doc/doc/c0023548.htm (and, I believe DB2 V8 is now out of support, see http://www-01.ibm.com/support/docview.wss?rs=71&uid=swg21370360 -- I think V9.5 is the current version).

Related

Composite Indexes, the “Include” Keyword, and How They Work

In SQL Server (and most other relational databases), a "Composite Index" is an index with multiple keys. Let's say we have this query that gets run a lot, and we want to create a covering index for this query to speed it up;
SELECT a, b FROM MyTable WHERE c = #val1 AND d = #val2
These are all possible composite indexes that would cover this query;
CREATE INDEX ix1 ON MyTable (c, d, a, b)
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b)
CREATE INDEX ix3 ON MyTable (d) INCLUDE (a, b, c)
CREATE INDEX ix4 ON MyTable (c) INCLUDE (a, b, d)
But apparently, they don't perform equally. According to Erlan Sommarskog (Microsoft MVP), the first two are faster than the 3rd and 4th, and the 4th is faster than the 3rd.
He goes on to explain;
ix2 is the "best" index, because a and b will not take up space in the higher levels of the index tree. Also, if a or b are updated, in ix2 there can be no page splits or similar as the index tree is unaffected.
However, I am having a hard time grasping what exactly is going on. I do have the general knowledge on b-tree indexes and how they work, but I don't understand the logic behind composite keys. For example;
CREATE INDEX ix1 ON MyTable (c, d, a, b)
Does the order of the columns here matter? If so, why? Also;
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b)
What is the difference between this composite key and the one above? I don't understand what difference "INCLUDE" makes.
Note: I know there are a lot of posts on Composite Keys, but I believe my last two questions are specific enough to not be a duplicate.
Does the order of the columns here matter?
Considering only the query in your question with 2 equality predicates, the order of the composite index key columns doesn't matter as long as both are the leftmost key columns of the composite index. Any of the covering indexes below will optimize this query:
CREATE INDEX ix1 ON MyTable (c, d, a, b);
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b);
CREATE INDEX ix3 ON MyTable (d, c, a, b);
CREATE INDEX ix4 ON MyTable (d, c, b, a);
CREATE INDEX ix5 ON MyTable (d, c) INCLUDE (a, b);
That said, the stats histogram contains only the leftmost index key column so the general guidance is to specify the most selective column first to improve row count estimates and execution plan quality. This consideration is more important for non-trivial queries where the optimizer has many choices and row count estimates are an important factor in choosing the best plan.
Another consideration for key order, which may conflict with the above general guidance, is when the index supports different queries and only some of the key columns are specified (e.g. SELECT a, b FROM MyTable WHERE d = #val2;). In that case, it would be better to specify d as the leftmost column regardless of selectivity in order to allow a single index to optimize multiple queries instead of creating a separate index to optimize the second query.
What is the difference between this composite key and the one above? I
don't understand what difference "INCLUDE" makes.
Included columns are not key columns. Key columns are maintained in logical order at every level throughout the b-tree whereas included columns are present only in the b-tree leaf nodes and not ordered. Consequently, the specified order of included columns does not matter. The only purpose of included columns is to help cover queries without adding them as key columns and incurring the associated overhead.
CREATE INDEX ix1 ON MyTable (c, d, a, b)
Does the order of the columns here matter? If so, why? Also;
Yes, order is very important while creating index, because each column is (from left) next level of deepness in index, so to determine the compilator to use this index you need always seek for c which is the "opener" of this set.
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b)
What is the difference between this composite key and the one above? I don't understand what difference "INCLUDE" makes.
But keep in mind that for each level of the index it starts to be less efficient, so if you know that > 80% of your queries will only seek by c & d and not a & b, but you will need that information in your SELECT (nor in WHERE) you should INCLUDE them, as part of the leaf at the last level of the index.
There are better explanations than mine so feel free to look at them:
INCLUDE equivalent in Oracle -> INCLUDE
How important is the order of columns in indexes? -> ORDER in INDEX set

Using dummy condition to make sure multi-column index is used?

Let's say I have a database table with a multi-column index on columns (A, B, C). I want to do a
SELECT ... WHERE C BETWEEN c1 AND c2
which is slow because the index does not get used.
Does it make any sense to try and 'fool' SQL Server into using the index by including dummy conditions on A and B? I.e.:
SELECT ... WHERE ((A >= MIN_VALUE) OR A IS NULL)
AND ((B >= MIN_VALUE) OR B IS NULL)
AND (C BETWEEN c1 AND c2)
I cannot modify the table in any way.
A multi-column index on (A, B, C) is not going to be used for a condition that is only on C.
Your attempt to "fool" SQL Server indicates that you don't fully understand how indexes work. Indexes are applied to conditions up-to and including the first inequality (actually, anything other than = or is null).
MySQL actually has a pretty good explanation of how multi-column indexes are used. What it says is generally true across databases (although some databases -- but not SQL Server -- have an additional operation called skip-scan which could be used in some additional cases).
Your index could be used for:
where A = #A and B = #B and C between #c1 and #c2
An index is not going to be used if c1 and c2 are columns in a table. (Well, it might be used in a full index scan but not a lookup/seek.)

Best indexes to create when a search may be on any or all of 3 fields

I am working on a new search function in our software where a user will be allowed to search on any or all of 3 possible fields A, B and C. I expect that if anything is entered for a field it will be a complete entry and not just a partial one.
So the user choices are
Field A, or
Field B, or
Field C, or
Fields A & B, or
Fields A & C, or
Fields B & C, or
Fields A, B & C.
My question is what indexes should be created on this table to provide maximum performance? This will be running on SQL Server 2005 and up I expect and a good user experience is essential.
this is difficult to answer without knowing your data or its usage. Hopefully A, B , and C are not long string data types. If you have minimal Insert/Update/Delete and/or will sacrifice everything for index usage, I would create an index on each of these:
A, B , C <<<handles queries for: A, or A & B, or A, B & C
A, C <<<handles queries for: A & C
B, C <<<handles queries for: B, or B & C
C <<<handles queries for: C
These should cover all combinations you have mentioned.
Also, you will also need to be careful to write a query that will actually use the index. If you have an OR in your WHERE you'll probably not use an index. In newer versions of SQL Server than you have you can use OPTION(RECOMPILE) to compile the query based on the runtime values of local variables and usually eliminate all OR and use an index. See:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
you can most likely use a dynamic query where you only add the necessary conditions onto the WHERE to get optional index usage:
The Curse and Blessings of Dynamic SQL by Erland Sommarskog
You can also see this answer for more on dynamic search conditions
Assuming searches are much more numerous, you will want to create an index on every subset of fields by which you wish to access your data. So that would be 6 indices if you wish to do it on the powerset of columns.
I would recommend this basic approach.
1) Make sure your table has a clustered index which is Unique, Ascending, and Small (ideally an INT).
2) Create the following three non-clustered indexes:
CREATE NONCLUSTERED INDEX ON dbo.YourTable(a) INCLUDE (b,c, [plus any potential output columns])
CREATE NONCLUSTERED INDEX ON dbo.YourTable(b) INCLUDE (a,c, [plus any potential output columns])
CREATE NONCLUSTERED INDEX ON dbo.YourTable(c) INCLUDE (a,b, [plus any potential output columns])
3) Use the index DMVs to compare the times each index is hit. If an index is used heavily, experiment by adding two more indexes. (Assume the index with C as a single tree node is the heavily used index.)
CREATE NONCLUSTERED INDEX ON dbo.YourTable(c,a) INCLUDE (b, [plus any potential output columns])
CREATE NONCLUSTERED INDEX ON dbo.YourTable(c,b) INCLUDE (a, [plus any potential output columns])
Compare how frequently they're used verses the single tree node index. If they're not being used infavor of the single tree node, they may be superfluous.
In summary, start with a minimal covering indexes and experiment based on usage.

Does a unique-index on two columns imply an index on each of them?

I have a table in my schema that has a unique constraint on two columns:
UNIQUE(Column1, Column2)
The SQlite documentation tells me that this creates a unique index on these columns. My question is, does that make an explicitly created index on one of the columns, say Column1, redundant?
Yes to your example, no to your question.
A compound index on 2 columns would make the additional index on the first one redundant. However, the index on the second column might still be useful.
But if each of the columns is by itself unique, it's possible you don't need a compound index. You might want to look into that.
Having too many indexes is not always an obvious problem. But wasting resources, especially for redundant purposes, is always bad.
Any one index containing multiple columns can also serve as an index for fewer of the same columns, provided they're all the ones at the start of the index.
Let me give you an example. An index for these columns:
a, b, c, d, e, f
Can also serve as an index for the following column combinations:
a, b, c, d, e
a, b, c, d
a, b, c
a, b
a
So for your question: The index you have can also serve as an index for Column1, but not for Column2.

Multiple and single indexes

I'm kinda ashamed of asking this since I've been working with MySQL for years, but oh well.
I have a table with two fields, a and b. I will be running the following queries on it:
SELECT * FROM ... WHERE A = 1;
SELECT * FROM ... WHERE B = 1;
SELECT * FROM ... WHERE A = 1 AND B = 1;
From the performance point of view, is at least one of the following configurations of indexes slower for at least one query? If yes, please elaborate.
ALTER TABLE ... ADD INDEX (a); ALTER TABLE ... ADD INDEX (b);
ALTER TABLE ... ADD INDEX (a, b);
ALTER TABLE ... ADD INDEX (a); ALTER TABLE ... ADD INDEX (b); ALTER TABLE ... ADD INDEX (a, b);
Thanks (note that we are talking about non unique indexes)
Yes, at least one case is considerably slower. If you only define the following index:
ALTER TABLE ... ADD INDEX (a, b);
... then the query SELECT * FROM ... WHERE B = 1; will not use that index.
When you create an index with a composite key, the order of the columns of the key is important. It is recommended to try to order the columns in the key to enhance selectivity, with the most selective columns to the left-most of the key. If you don't do this, and put a non-selective column as the first part of the key, you risk not using the index at all. (Source: Tips on Optimizing SQL Server Composite Index)
It's very improbable that mere existence of an index slow down a SELECT query: it just won't be used.
In theory the optimizer can incorrectly choose more long index on (a, b) rather than one on (a) to serve the query which searches only for a.
In practice, I've never seen it: MySQL usually does the opposite mistake, taking a shorter index when a longer one exists.
Update:
In your case, either of the following configurations will suffice for all queries:
(a, b); (b)
or
(b, a); (a)
MySQL can also use two separate indexes with index_intersect, so creating these indexes
(a); (b)
will also speed up the query with a = 1 AND b = 1, though to a lesser extent than any of the solutions above.
You may also want to read this article in my blog:
Creating indexes
Update 2:
Seems I finally understood your question :)
ALTER TABLE ... ADD INDEX (a); ALTER TABLE ... ADD INDEX (b);
Excellent for a = 1 and b = 1, reasonably good for a = 1 AND b = 1
ALTER TABLE ... ADD INDEX (a, b);
Excellent for a = 1 AND b = 1, almost excellent for a = 1, poor for b = 1
ALTER TABLE ... ADD INDEX (a); ALTER TABLE ... ADD INDEX (b); ALTER TABLE ... ADD INDEX (a, b);
Excellent for all three queries.
SQL will choose the index that best covers the query.
An index on A, B will cover the query for both case 1 and 3, but not for 2 (since the primary index column is A)
So to cover all three queries you need two indexes:
ALTER TABLE ... ADD INDEX (a, b); ALTER TABLE ... ADD INDEX (b)
For the example you have index set #3 is optimal. Mysql will choose the single A and B indices for single column where clauses, and use the compound index for the A & B where clause.