Multiple and single indexes

Multiple and single indexes - sql

I'm kinda ashamed of asking this since I've been working with MySQL for years, but oh well.
I have a table with two fields, a and b. I will be running the following queries on it:
SELECT * FROM ... WHERE A = 1;
SELECT * FROM ... WHERE B = 1;
SELECT * FROM ... WHERE A = 1 AND B = 1;
From the performance point of view, is at least one of the following configurations of indexes slower for at least one query? If yes, please elaborate.
ALTER TABLE ... ADD INDEX (a); ALTER TABLE ... ADD INDEX (b);
ALTER TABLE ... ADD INDEX (a, b);
ALTER TABLE ... ADD INDEX (a); ALTER TABLE ... ADD INDEX (b); ALTER TABLE ... ADD INDEX (a, b);
Thanks (note that we are talking about non unique indexes)

Yes, at least one case is considerably slower. If you only define the following index:
ALTER TABLE ... ADD INDEX (a, b);
... then the query SELECT * FROM ... WHERE B = 1; will not use that index.
When you create an index with a composite key, the order of the columns of the key is important. It is recommended to try to order the columns in the key to enhance selectivity, with the most selective columns to the left-most of the key. If you don't do this, and put a non-selective column as the first part of the key, you risk not using the index at all. (Source: Tips on Optimizing SQL Server Composite Index)

It's very improbable that mere existence of an index slow down a SELECT query: it just won't be used.
In theory the optimizer can incorrectly choose more long index on (a, b) rather than one on (a) to serve the query which searches only for a.
In practice, I've never seen it: MySQL usually does the opposite mistake, taking a shorter index when a longer one exists.
Update:
In your case, either of the following configurations will suffice for all queries:
(a, b); (b)
or
(b, a); (a)
MySQL can also use two separate indexes with index_intersect, so creating these indexes
(a); (b)
will also speed up the query with a = 1 AND b = 1, though to a lesser extent than any of the solutions above.
You may also want to read this article in my blog:
Creating indexes
Update 2:
Seems I finally understood your question :)
ALTER TABLE ... ADD INDEX (a); ALTER TABLE ... ADD INDEX (b);
Excellent for a = 1 and b = 1, reasonably good for a = 1 AND b = 1
ALTER TABLE ... ADD INDEX (a, b);
Excellent for a = 1 AND b = 1, almost excellent for a = 1, poor for b = 1
ALTER TABLE ... ADD INDEX (a); ALTER TABLE ... ADD INDEX (b); ALTER TABLE ... ADD INDEX (a, b);
Excellent for all three queries.

SQL will choose the index that best covers the query.
An index on A, B will cover the query for both case 1 and 3, but not for 2 (since the primary index column is A)
So to cover all three queries you need two indexes:
ALTER TABLE ... ADD INDEX (a, b); ALTER TABLE ... ADD INDEX (b)

For the example you have index set #3 is optimal. Mysql will choose the single A and B indices for single column where clauses, and use the compound index for the A & B where clause.

Related

Composite Indexes, the “Include” Keyword, and How They Work

In SQL Server (and most other relational databases), a "Composite Index" is an index with multiple keys. Let's say we have this query that gets run a lot, and we want to create a covering index for this query to speed it up;
SELECT a, b FROM MyTable WHERE c = #val1 AND d = #val2
These are all possible composite indexes that would cover this query;
CREATE INDEX ix1 ON MyTable (c, d, a, b)
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b)
CREATE INDEX ix3 ON MyTable (d) INCLUDE (a, b, c)
CREATE INDEX ix4 ON MyTable (c) INCLUDE (a, b, d)
But apparently, they don't perform equally. According to Erlan Sommarskog (Microsoft MVP), the first two are faster than the 3rd and 4th, and the 4th is faster than the 3rd.
He goes on to explain;
ix2 is the "best" index, because a and b will not take up space in the higher levels of the index tree. Also, if a or b are updated, in ix2 there can be no page splits or similar as the index tree is unaffected.
However, I am having a hard time grasping what exactly is going on. I do have the general knowledge on b-tree indexes and how they work, but I don't understand the logic behind composite keys. For example;
CREATE INDEX ix1 ON MyTable (c, d, a, b)
Does the order of the columns here matter? If so, why? Also;
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b)
What is the difference between this composite key and the one above? I don't understand what difference "INCLUDE" makes.
Note: I know there are a lot of posts on Composite Keys, but I believe my last two questions are specific enough to not be a duplicate.

Does the order of the columns here matter?
Considering only the query in your question with 2 equality predicates, the order of the composite index key columns doesn't matter as long as both are the leftmost key columns of the composite index. Any of the covering indexes below will optimize this query:
CREATE INDEX ix1 ON MyTable (c, d, a, b);
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b);
CREATE INDEX ix3 ON MyTable (d, c, a, b);
CREATE INDEX ix4 ON MyTable (d, c, b, a);
CREATE INDEX ix5 ON MyTable (d, c) INCLUDE (a, b);
That said, the stats histogram contains only the leftmost index key column so the general guidance is to specify the most selective column first to improve row count estimates and execution plan quality. This consideration is more important for non-trivial queries where the optimizer has many choices and row count estimates are an important factor in choosing the best plan.
Another consideration for key order, which may conflict with the above general guidance, is when the index supports different queries and only some of the key columns are specified (e.g. SELECT a, b FROM MyTable WHERE d = #val2;). In that case, it would be better to specify d as the leftmost column regardless of selectivity in order to allow a single index to optimize multiple queries instead of creating a separate index to optimize the second query.
What is the difference between this composite key and the one above? I
don't understand what difference "INCLUDE" makes.
Included columns are not key columns. Key columns are maintained in logical order at every level throughout the b-tree whereas included columns are present only in the b-tree leaf nodes and not ordered. Consequently, the specified order of included columns does not matter. The only purpose of included columns is to help cover queries without adding them as key columns and incurring the associated overhead.

CREATE INDEX ix1 ON MyTable (c, d, a, b)
Does the order of the columns here matter? If so, why? Also;
Yes, order is very important while creating index, because each column is (from left) next level of deepness in index, so to determine the compilator to use this index you need always seek for c which is the "opener" of this set.
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b)
What is the difference between this composite key and the one above? I don't understand what difference "INCLUDE" makes.
But keep in mind that for each level of the index it starts to be less efficient, so if you know that > 80% of your queries will only seek by c & d and not a & b, but you will need that information in your SELECT (nor in WHERE) you should INCLUDE them, as part of the leaf at the last level of the index.
There are better explanations than mine so feel free to look at them:
INCLUDE equivalent in Oracle -> INCLUDE
How important is the order of columns in indexes? -> ORDER in INDEX set

Why is this query not using an index sort?

Note: Table/Column/Index names are made up.
Background
I am having some trouble figuring out how to query one of my database tables efficiently (the table has about a million rows). The query in question involves a WHERE clause with a foreign key and an ORDER BY clause with another column.
The database generates an index on the FK and I create an index on the column I will use for ordering:
CREATE INDEX ab ON a(b);
Problem
When I run the query without filtering on the FK:
EXPLAIN SELECT * FROM a ORDER BY b;
The database properly uses the index to sort. I know this because the result (truncated) from this query returns:
FROM PUBLIC.A
/* PUBLIC.AB */
ORDER BY 3
/* index sorted */
However, when the query is modified to filter on the FK:
EXPLAIN SELECT * FROM a WHERE a_fk_id = 3 ORDER BY b
Only the FK index is used:
FROM PUBLIC.A
/* PUBLIC.A_FK_INDEX_NAME: A_FK_ID = 3 */
WHERE A_FK_ID = 3
ORDER BY 3
As you can see, only the FK index is used.
Questions
What is going on here?
I thought perhaps it had something to do with the separate indexes, but even creating a multi-column index like:
CREATE INDEX a_fk_id_b ON a(a_fk_id, b);
Did nothing to resolve the problem (neither did reversing the order of those columns in the index, but I didn't expect it to anyways).
Any suggestions would be greatly appreciated. I am by no means a database or SQL expert, but I was surprised to get these results. Perhaps I just need to query for this information differently, but I figured this was a relatively simple case.

Turns out the answer is very simple, although not entirely obvious (at least not to me). The table needed to have a composite index between the FK and the order column:
CREATE INDEX a_fk_id_b ON a(a_fk_id, b);
And in order for the database to utilize that index instead of the generated FK index, the a_fk_id column needed to included in the ORDER BY clause:
EXPLAIN SELECT * FROM a WHERE a_fk_id = 3 ORDER BY a_fk_id, b;
This results in our composite index being used for filtering as well as for ordering as shown in this truncated explain plan:
FROM PUBLIC.A
/* PUBLIC.A_FK_ID_B: A_FK_ID = 3 */
WHERE A_FK_ID = 3
ORDER BY 25, 19
/* index sorted */

Should I use multicolumn index or two 1-column?

I have a table which I currently define as follows:
CREATE TABLE pairs (
id INTEGER PRIMARY KEY,
p1 INTEGER,
p2 INTEGER,
r INTEGER,
UNIQUE(p1, p2) ON CONFLICT IGNORE,
FOREIGN KEY (p1) REFERENCES points(id),
FOREIGN KEY (p2) REFERENCES points(id)
)
After that it is filled with gigabytes of data. Now I will need to do a lot of selects exactly like this:
SELECT id, r FROM pairs WHERE p1 = 666 OR p2 = 666
So the question is: what indexes I should create to speed up this select?
CREATE INDEX p1_index ON pairs(p1)
CREATE INDEX p2_index ON pairs(p2)
or may be
CREATE UNIQUE INDEX p_index ON pairs(p1, p2)
or may be even both? (and buy a new HDD for them).
SQLite3 does not create automatically index for a UNIQUE constraint on multiple columns.

Since you are using the OR condition, I would go with multiple indexes. If it was an AND condition then a multi-column index would work better.
For the OR condition:
The optimizer will start looking at one of the indexes, finds a match and just grabs that row. The other index will only be looked at when there is no match with the first one. On multi-processor systems, both the indexes will be (should be) scanned in parallel too. Awesome, right?
For the AND condition:
If 2 indexes are available, the optimizer will have to look at both of them, merge the output of the two index scans and then fetch the results from the base table. This may turn out to be expensive. Here, a multi-column index would have been great.
But then again, the optimizer may choose a different path based on the available table and index statistics.
Hope this helps.

Use EXPLAIN QUERY PLAN to check if indexes are used.
For your example query, both the single-column indexes would be used:
> EXPLAIN QUERY PLAN SELECT id, r FROM pairs WHERE p1 = 666 OR p2 = 666;
0|0|0|SEARCH TABLE pairs USING INDEX p1_index (p1=?) (~10 rows)
0|0|0|SEARCH TABLE pairs USING INDEX p2_index (p2=?) (~10 rows)
The multi-column index (which you already have because of the UNIQUE constraint) would be used if the lookup of a single record needs both columns:
> EXPLAIN QUERY PLAN SELECT id, r FROM pairs WHERE p1 = 666 AND p2 = 666;
0|0|0|SEARCH TABLE pairs USING INDEX sqlite_autoindex_pairs_1 (p1=? AND p2=?) (~1 rows)
However, a multi-column index can also be used for lookups on its first column(s):
> DROP INDEX p1_index;
> EXPLAIN QUERY PLAN SELECT id, r FROM pairs WHERE p1 = 666 OR p2 = 666;
0|0|0|SEARCH TABLE pairs USING INDEX sqlite_autoindex_pairs_1 (p1=?) (~10 rows)
0|0|0|SEARCH TABLE pairs USING INDEX p2_index (p2=?) (~10 rows)
Also see the documentation:
Query Optimizer Overview,
Query Planning.

is a db index composite by default?

when I create an index on a db2, for example with the following code:
CREATE INDEX T_IDX ON T(
A,
B)
is it a composite index?
if not: how can I then create a composite index?
if yes: in order to have two different index should I create them separately as:
CREATE INDEX T1_IDX ON T(A)
CREATE INDEX T2_IDX ON T(A)
EDIT: this discussion is not going in the direction I expect (but in a better one :)) I actually asked how, and not why to create separate indexes, I planed to do that in a different question, but since you anticipated me:
suppose I have a table T(A,B,C) and a search function search() that select from the table using any of the following method
WHERE A = x
WHERE B = x
WHERE C = x
WHERE A = x AND B=y (and so on AC, CB, ABC)
if I create a compose index ABC, is it going to working for example when I select on just C?
the table is quite big, and the insert\update not so frequent

Yep multiple fields on create index = composite by definition: Specify two or more column names to create a composite index.
Understanding when to use composite indexes appears to be your last question...
If all columns selected by a query are in a composite index, then the dbengine can return these values from the index without accessing the table. so you have faster seek time.
However if one or the other are used in queries, then creating individual indexes will serve you best. It depends on the types of queries executed and what values they contain/filter/join.
If you sometimes have one, the other, or both, then creating all 3 indexes is a possibility as well. But keep in mind each additional index increases the amount of time it takes to insert, update or delete, so on highly maintained tables, more indexes are generally bad since the overhead to maintain the indexes effects performance.

The index on A, B is a composite index, and can be used to seek on just A or a seek on A with B or for a general scan, of course.
There is usually not much of a point in having an index on A, B and an index on just A, since a partial search on A, B can be used if you only have A. That wider index will be a little less efficient, however, so if the A lookup is extremely frequent and the write requirements mean that it is acceptable to update the extra index, it could be justifiable.
Having an index on B may be necessary, since the A, B index is not very suitable for searches based on B only.

First Answer: YES
CREATE INDEX JOB_BY_DPT
ON EMPLOYEE (WORKDEPT, JOB)
Second Answer:
It depends on your query; if most of the time your query referrence a single column in where clause like select * from T where A = 'something' then a single index would be what you want but if both column A and B get referrenced then you should go for creating a composite one.
For further referrence please check
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0000919.htm

Do covering indices safely replace smaller covering indices?

For example:
Given columns A,B,C,D,
IX_A is an index on 'A'
IX_AB is a covering index on 'AB'
IX_A can be safely removed, for it is redundant: IX_AB will be used in its place.
I want to know if this generalizes:
If I have:
IX_AB
IX_ABC
IX_ABCD
and so forth,
Can the lesser indices still be safely removed?
That is, does IX_ABC make IX_AB redundant, and does IX_ABCD make both IX_AB and IX_ABC redundant?

In general -- and this varies from server to server -- a covering index will cover smaller-selections of the index.
So if you have an index that covers a, b, c, that usually automatically gives you an index that covers a, and a, b.
You are not guaranteed to have, for example, a covering index of b, c.

Yes, for the most part.
However, IX_ABCD isn't terribly helpful as a replacement for, say, IX_BCD.
There is a caveat, however: indexes still may require disk reads, so if C and D explode the size of the index, there will be some inefficiency in looking up A,B in IX_ABCD that wouldn't occur when looking it up in IX_AB.
However, that difference is likely outweighed by the additional performance hit of maintaining IX_AB separately.

The important thing is the leading columns in the index. If you have the index IX_ABCD the following queries will use the index:
select * from table where A = 1
select * from table where A = 1 and B = 1
select * from table where A = 1 and B = 1 and C = 1
However, the following will most likely not uses the index (at least not how you intended):
select * from table where B = 1
select * from table where C = 1
select * from table where B = 1 and C = 1
The important thing is that the leading columns are used. Therefore the order of the columns when the index is created does matter.

Not necessarily. While is true that an index on (A, B, C) can be used for a filtering predicate on A or an ordering request on A or a join condition on A, that does not necessarily mean that the index (A) alone is useless. If the index on (A, B, C) is considerably wider than (A), then a range scan on A alone will save significant I/O because it would have to read fewer pages (narrower index).
ut I admint that this would be the exception rather than the rule. In general is safe to remove an index on A if another one on (A, B) exists. Note that an index on (A,B) does not satisfy any filtering on B so, is safe to remove only if the leftmost column(s) are the same. Some databases have 'skip-scan' operators that can use an index on (A,B) for looking up B, but that is a very narrow border case.

Always best not to assume anything about database engine internals and actually check the actual query plans being used.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas