Best indexes to create when a search may be on any or all of 3 fields

Best indexes to create when a search may be on any or all of 3 fields - sql

I am working on a new search function in our software where a user will be allowed to search on any or all of 3 possible fields A, B and C. I expect that if anything is entered for a field it will be a complete entry and not just a partial one.
So the user choices are
Field A, or
Field B, or
Field C, or
Fields A & B, or
Fields A & C, or
Fields B & C, or
Fields A, B & C.
My question is what indexes should be created on this table to provide maximum performance? This will be running on SQL Server 2005 and up I expect and a good user experience is essential.

this is difficult to answer without knowing your data or its usage. Hopefully A, B , and C are not long string data types. If you have minimal Insert/Update/Delete and/or will sacrifice everything for index usage, I would create an index on each of these:
A, B , C <<<handles queries for: A, or A & B, or A, B & C
A, C <<<handles queries for: A & C
B, C <<<handles queries for: B, or B & C
C <<<handles queries for: C
These should cover all combinations you have mentioned.
Also, you will also need to be careful to write a query that will actually use the index. If you have an OR in your WHERE you'll probably not use an index. In newer versions of SQL Server than you have you can use OPTION(RECOMPILE) to compile the query based on the runtime values of local variables and usually eliminate all OR and use an index. See:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
you can most likely use a dynamic query where you only add the necessary conditions onto the WHERE to get optional index usage:
The Curse and Blessings of Dynamic SQL by Erland Sommarskog
You can also see this answer for more on dynamic search conditions

Assuming searches are much more numerous, you will want to create an index on every subset of fields by which you wish to access your data. So that would be 6 indices if you wish to do it on the powerset of columns.

I would recommend this basic approach.
1) Make sure your table has a clustered index which is Unique, Ascending, and Small (ideally an INT).
2) Create the following three non-clustered indexes:
CREATE NONCLUSTERED INDEX ON dbo.YourTable(a) INCLUDE (b,c, [plus any potential output columns])
CREATE NONCLUSTERED INDEX ON dbo.YourTable(b) INCLUDE (a,c, [plus any potential output columns])
CREATE NONCLUSTERED INDEX ON dbo.YourTable(c) INCLUDE (a,b, [plus any potential output columns])
3) Use the index DMVs to compare the times each index is hit. If an index is used heavily, experiment by adding two more indexes. (Assume the index with C as a single tree node is the heavily used index.)
CREATE NONCLUSTERED INDEX ON dbo.YourTable(c,a) INCLUDE (b, [plus any potential output columns])
CREATE NONCLUSTERED INDEX ON dbo.YourTable(c,b) INCLUDE (a, [plus any potential output columns])
Compare how frequently they're used verses the single tree node index. If they're not being used infavor of the single tree node, they may be superfluous.
In summary, start with a minimal covering indexes and experiment based on usage.

Related

Can PostgreSQL use an index for row-wise comparison?

Let's say that we have the following SQL:
SELECT a, b, c
FROM example_table
WHERE a = '12345' AND (b, c) <= ('2020-08-15'::date, '2020-08-15 00:40:33'::timestamp)
LIMIT 20
Can PostgreSQL efficiently use a B-Tree index defined on (a, b, c) to answer this query?
To elaborate a little bit on the use-case. This SQL query is part of my cursor-pagination implementation. Since I'm using a UUID as a primary key, I have to resort to using the date/timestamp columns for the cursor, which more closely fits my actual needs anyway. I'm new to PostgreSQL and this row-wise comparison feature, so I'm unsure how I can use an index to speed it up. In my testing using "explain analyze" I wasn't able to make the query use the index, but I assume this may be due to the fact that a table scan is more efficient given that there aren't many rows in the table.

Well, it should use the index. But only the first two columns. It can scan the rows with value of a you have specified. If the index is up-to-date with the table, then Postgres can pull the values of b and c from the index. That will allow it to scan a range of values for b.

Oracle multiple vs single column index

Imagine I have a table with the following columns:
Column: A (numer(10)) (PK)
Column: B (numer(10))
Column: C (numer(10))
CREATE TABLE schema_name.table_name (
column_a number(10) primary_key,
column_b number(10) ,
column_c number(10)
);
Column A is my PK.
Imagine my application now has a flow that queries by B and C. Something like:
SELECT * FROM SCHEMA.TABLE WHERE B=30 AND C=99
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
Q1 - If so, why should I create an index with those two columns?
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?

The simple answers to your questions.
For this query:
SELECT *
FROM SCHEMA.TABLE
WHERE B = 30 AND C = 99;
The optimal index either (B, C) or (C, B). The order does matter because the two comparisons are =.
An index on either column can be used, but all the matching values will need to be scanned to compare to the second value.
If you have an index on (B, C), then this can be used for a query on WHERE B = 30. Oracle also implements a skip-scan optimization, so it is possible that the index could also be used for WHERE C = 99 -- but it probably would not be.
I think the documentation for MySQL has a good introduction to multi-column indexes. It doesn't cover the skip-scan but is otherwise quite applicable to Oracle.

Short answer: always check the real performance, not theoretical. It means, that my answer requires verification at real database.
Inside SQL (Oracle, Postgre, MsSql, etc.) the Primary Key is used for at least two purposes:
Ordering of rows (e.g. if PK is incremented only then all values will be appended)
Link to row. It means that if you have any extra index, it will contain whole PK to have ability to jump from additional index to other rows.
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
It depends. If your table is too small, Oracle can do just full scan of it. For large table Oracle can (and will do in common scenario) use index for column B and next do range scan. In this case Oracle check all values with B=30. Therefore, if you can only one row with B=30 then you can achieve good performance. If you have millions of such rows, Oracle will need to do million of reads. Oracle can get this information via statistic.
Q1 - If so, why should I create an index with those two columns?
It is needed to direct access to row. In this case Oracle requires just few jumps to find your row. Moreover, you can apply unique modifier to help Oracle. Then it will know, that not more than single row will be returned.
However if your table has other columns, real execution plan will include access to PK (to retrieve other rows).
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
Yes. Please check the details here. If index have several columns, than Oracle will sort them according to column ordering. E.g. if you create index with columns B, C then Oracle will able to use it to retrieve values like "B=30", e.g. when you restricted only B.

Well, it all depends.
If that table is tiny, you won't see any benefit regardless any indexes you might create - it is just too small and Oracle returns data immediately.
If the table is huge, then it depends on column's selectivity. There's no guarantee that Oracle will ever use that index. If optimizer decides (upon information it has - don't forget to regularly collect statistics!) that the index should not be used, then you created it in vain (though, you can choose to use a hint, but - unless you know what you're doing, don't do it).
How will you know what's going on? See the explain plan.
But, generally speaking, yes - indexes help.
Q1 - If so, why should I create an index with those two columns?
Which "two columns"? A? If it is a primary key column, Oracle automatically creates an index, you don't have to do that.
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If you are talking about a composite index (containing both B and C columns, respectively), and if query uses B column, then yes - index will (OK, might be used). But, if query uses only column C, then this index will be completely useless.

In spite of this question being answered and one answer being accepted already, I'll just throw in some more information :-)
An index is an offer to the DBMS that it can use to access data quicker in some situations. Whether it actually uses the index is a decision made by the DBMS.
Oracle has a built-in optimizer that looks at the query and tries to find the best execution plan to get the results you are after.
Let's say that 90% of all rows have B = 30 AND C = 99. Why then should Oracle laboriously walk through the index only to have to access almost every row in the table at last? So, even with an index on both columns, Oracle may decide not to use the index at all and even perform the query faster because of the decision against the index.
Now to the questions:
If I create an index only using the Column B, this will already improve my query right?
It may. If Oracle thinks that B = 30 reduces the rows it will have to read from the table imensely, it will.
If so, why should I create an index with those two columns?
If the combination of B = 30 AND C = 99 limits the rows to read from the table further, it's a good idea to use this index instead.
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If the index is on (B, C), i.e. B first, then Oracle may find it useful, yes. In the extreme case that there are only the two columns in the table, that would even be a covering index (i.e. containing all columns accessed in the query) and the DBMS wouldn't have to read any table row, as all the information is already in the index itself. If the index is (C, B), i.e. C first, it is quite unlikely that the index would be used. In some edge-case situations, Oracle might do so, though.

How does SQL Server treat Included columns in a nonclustered index?

I have a question:
Definition of non-clustered index says that included columns in index are not counted by database engine in sense of index size or maximum number of columns.
So what is really the way they work?
How they help to SQL Server when they are not acting in index size?

The important thing to note is that included columns are not counted by the database engine when determining the size or number of columns in the index key (the value used to actually look up data in the index structure). They still add to the size of the index itself.
Index keys are only allowed to be 900 bytes in size across all columns that make up the key (there can be only 16 columns that make up the index key).
Adding included columns doesn't count towards the 900 byte/16 column limits, but can make the index more useful by covering more queries.

Good explanations from the other people here.
For me, included index columns are rather easy to remember and use with this simple rule:
Filters, ie. WHERE x = y etc.., are your keys, the decision whether to use the index or not is based on those. SELECT a, b, x are the values you're actually returning, those are the things you want to include in your index so SQL Server doesn't have to go searching through the clustered index / heap to find them.
Example:
CREATE NONCLUSTERED INDEX TABLEX_A_IDX ON TABLEX (A) INCLUDE (B, C)
SELECT A, B, C -- KEY + INCLUDED columns
FROM TABLEX WHERE A = 'ASD' -- KEY columns
Granted, this wasn't exactly your question, but it might help just the same.

is a db index composite by default?

when I create an index on a db2, for example with the following code:
CREATE INDEX T_IDX ON T(
A,
B)
is it a composite index?
if not: how can I then create a composite index?
if yes: in order to have two different index should I create them separately as:
CREATE INDEX T1_IDX ON T(A)
CREATE INDEX T2_IDX ON T(A)
EDIT: this discussion is not going in the direction I expect (but in a better one :)) I actually asked how, and not why to create separate indexes, I planed to do that in a different question, but since you anticipated me:
suppose I have a table T(A,B,C) and a search function search() that select from the table using any of the following method
WHERE A = x
WHERE B = x
WHERE C = x
WHERE A = x AND B=y (and so on AC, CB, ABC)
if I create a compose index ABC, is it going to working for example when I select on just C?
the table is quite big, and the insert\update not so frequent

Yep multiple fields on create index = composite by definition: Specify two or more column names to create a composite index.
Understanding when to use composite indexes appears to be your last question...
If all columns selected by a query are in a composite index, then the dbengine can return these values from the index without accessing the table. so you have faster seek time.
However if one or the other are used in queries, then creating individual indexes will serve you best. It depends on the types of queries executed and what values they contain/filter/join.
If you sometimes have one, the other, or both, then creating all 3 indexes is a possibility as well. But keep in mind each additional index increases the amount of time it takes to insert, update or delete, so on highly maintained tables, more indexes are generally bad since the overhead to maintain the indexes effects performance.

The index on A, B is a composite index, and can be used to seek on just A or a seek on A with B or for a general scan, of course.
There is usually not much of a point in having an index on A, B and an index on just A, since a partial search on A, B can be used if you only have A. That wider index will be a little less efficient, however, so if the A lookup is extremely frequent and the write requirements mean that it is acceptable to update the extra index, it could be justifiable.
Having an index on B may be necessary, since the A, B index is not very suitable for searches based on B only.

First Answer: YES
CREATE INDEX JOB_BY_DPT
ON EMPLOYEE (WORKDEPT, JOB)
Second Answer:
It depends on your query; if most of the time your query referrence a single column in where clause like select * from T where A = 'something' then a single index would be what you want but if both column A and B get referrenced then you should go for creating a composite one.
For further referrence please check
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0000919.htm

DB2 multiple column index issue

If I have a multiple column index(a,b,c) and one separate index d then my query is using b,a,c,d order whether both the index will be chosen? Whether index will be choosen or not for this query?

In the DB2 command line, try this:
db2 => explain plan for ...insert-query-here...
In the result, you will see when indexes are used and when DB2 will use full table scans.

If you change it to a, b, c, d then the (a, b, c) index will be used.
If you change it to d, b, a, c (or anything starting with d), the (d) index will be chosen.
Basically, use the columns in the same order as the index you want to use.

A cost-based optimizer (such as DB2's) will use the statistics for the table and index objects to determine whether the (A,B,C) index or the (D) index will be better suited for a given query. The index cardinality (number of unique values in the index) is one of several data statistics gathered by the RUNSTATS command to help the optimizer estimate the relative I/O and CPU costs with each possible access path. The explain command mentioned in Aaron Digulla's answer and also the db2expln facility can help you understand which index will be chosen.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas