is a db index composite by default? - sql

when I create an index on a db2, for example with the following code:
CREATE INDEX T_IDX ON T(
A,
B)
is it a composite index?
if not: how can I then create a composite index?
if yes: in order to have two different index should I create them separately as:
CREATE INDEX T1_IDX ON T(A)
CREATE INDEX T2_IDX ON T(A)
EDIT: this discussion is not going in the direction I expect (but in a better one :)) I actually asked how, and not why to create separate indexes, I planed to do that in a different question, but since you anticipated me:
suppose I have a table T(A,B,C) and a search function search() that select from the table using any of the following method
WHERE A = x
WHERE B = x
WHERE C = x
WHERE A = x AND B=y (and so on AC, CB, ABC)
if I create a compose index ABC, is it going to working for example when I select on just C?
the table is quite big, and the insert\update not so frequent

Yep multiple fields on create index = composite by definition: Specify two or more column names to create a composite index.
Understanding when to use composite indexes appears to be your last question...
If all columns selected by a query are in a composite index, then the dbengine can return these values from the index without accessing the table. so you have faster seek time.
However if one or the other are used in queries, then creating individual indexes will serve you best. It depends on the types of queries executed and what values they contain/filter/join.
If you sometimes have one, the other, or both, then creating all 3 indexes is a possibility as well. But keep in mind each additional index increases the amount of time it takes to insert, update or delete, so on highly maintained tables, more indexes are generally bad since the overhead to maintain the indexes effects performance.

The index on A, B is a composite index, and can be used to seek on just A or a seek on A with B or for a general scan, of course.
There is usually not much of a point in having an index on A, B and an index on just A, since a partial search on A, B can be used if you only have A. That wider index will be a little less efficient, however, so if the A lookup is extremely frequent and the write requirements mean that it is acceptable to update the extra index, it could be justifiable.
Having an index on B may be necessary, since the A, B index is not very suitable for searches based on B only.

First Answer: YES
CREATE INDEX JOB_BY_DPT
ON EMPLOYEE (WORKDEPT, JOB)
Second Answer:
It depends on your query; if most of the time your query referrence a single column in where clause like select * from T where A = 'something' then a single index would be what you want but if both column A and B get referrenced then you should go for creating a composite one.
For further referrence please check
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0000919.htm

Related

Does PostgreSQL use all available indexes to run a query faster?

We are structuring a project where some tables will have many records, and we intend to use 4 numeric foreign keys and 1 numeric primary, our assumption is that if we create an index for each foreign key and the default index of the primary key, the postgres planning would use all the starts (5 in total) to perform the query.
95% of the time the queries would be providing at least the 4 foreign keys.
Would each index be used to position the search faster in the sequential section of records?
Would having 4 indexes increase the speed of the query or would it suffice with a single index of the parent level (branch_id)?
Thank you for your time and experience.
example: if all foreign keys have an index
SELECT * FROM products WHERE
account_d=1 AND
organization_id=2 AND
business_id=3 AND
branch_id=4 AND
product_id=5;
example: if I only indicate the id of the primary key
SELECT * FROM products WHERE product_id=5;
If all 4 columns are specified by equality, it is possible to combine the single-column indexes using BitmapAnd. However, this would be less efficient than using one multi-column index on all four columns.
Since that will apparently be a very common query, it would make sense to have that multi-column index.
Usually you will want to index each foreign key column. Otherwise, if you want to delete an organization, for example, it would need to scan the whole table to verify that no records were still referencing it. Whichever column is the first one in the multi-column index will not need to also have a single-column index for it. But the other 3 which are not first probably still need their own indexes.
Indexes are (predominantly) used when filtering or joining tables, so whether the indexes you are proposing are useful is entirely dependent on the SQL you are running and whether the query optimiser determines that using an index would be beneficial.
For example, if you ran SELECT * FROM TABLE then none of the indexes would be used.
I can’t comment on Postgresql specifically but many/most DBMSs automatically create indexes when you define PKs/FKs - so you will get the indexes anyway, regardless of any performance tuning you are trying to implement
Update
Having individual indexes on each column is not going to help with the query you’ve provided, the optimiser will only use one of them, probably the PK. A compound index on multiple columns would help, but the more columns you add to the index, the more restrictive the pattern of queries it will benefit.
Say you have 3 columns A, B, C and include them all in WHERE clause, then having a compound index of A+B+C would be highly beneficial.
If you keep this index but your WHERE clause only has columns A, B it will still benefit significantly as the query can still use the A+B subset of the index.
If your WHERE clause only has columns A,C then it would benefit only slightly, as it would select all records from the index that start with the A value - but then would have to filter them to find the subset with the C value

Oracle multiple vs single column index

Imagine I have a table with the following columns:
Column: A (numer(10)) (PK)
Column: B (numer(10))
Column: C (numer(10))
CREATE TABLE schema_name.table_name (
column_a number(10) primary_key,
column_b number(10) ,
column_c number(10)
);
Column A is my PK.
Imagine my application now has a flow that queries by B and C. Something like:
SELECT * FROM SCHEMA.TABLE WHERE B=30 AND C=99
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
Q1 - If so, why should I create an index with those two columns?
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
The simple answers to your questions.
For this query:
SELECT *
FROM SCHEMA.TABLE
WHERE B = 30 AND C = 99;
The optimal index either (B, C) or (C, B). The order does matter because the two comparisons are =.
An index on either column can be used, but all the matching values will need to be scanned to compare to the second value.
If you have an index on (B, C), then this can be used for a query on WHERE B = 30. Oracle also implements a skip-scan optimization, so it is possible that the index could also be used for WHERE C = 99 -- but it probably would not be.
I think the documentation for MySQL has a good introduction to multi-column indexes. It doesn't cover the skip-scan but is otherwise quite applicable to Oracle.
Short answer: always check the real performance, not theoretical. It means, that my answer requires verification at real database.
Inside SQL (Oracle, Postgre, MsSql, etc.) the Primary Key is used for at least two purposes:
Ordering of rows (e.g. if PK is incremented only then all values will be appended)
Link to row. It means that if you have any extra index, it will contain whole PK to have ability to jump from additional index to other rows.
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
It depends. If your table is too small, Oracle can do just full scan of it. For large table Oracle can (and will do in common scenario) use index for column B and next do range scan. In this case Oracle check all values with B=30. Therefore, if you can only one row with B=30 then you can achieve good performance. If you have millions of such rows, Oracle will need to do million of reads. Oracle can get this information via statistic.
Q1 - If so, why should I create an index with those two columns?
It is needed to direct access to row. In this case Oracle requires just few jumps to find your row. Moreover, you can apply unique modifier to help Oracle. Then it will know, that not more than single row will be returned.
However if your table has other columns, real execution plan will include access to PK (to retrieve other rows).
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
Yes. Please check the details here. If index have several columns, than Oracle will sort them according to column ordering. E.g. if you create index with columns B, C then Oracle will able to use it to retrieve values like "B=30", e.g. when you restricted only B.
Well, it all depends.
If that table is tiny, you won't see any benefit regardless any indexes you might create - it is just too small and Oracle returns data immediately.
If the table is huge, then it depends on column's selectivity. There's no guarantee that Oracle will ever use that index. If optimizer decides (upon information it has - don't forget to regularly collect statistics!) that the index should not be used, then you created it in vain (though, you can choose to use a hint, but - unless you know what you're doing, don't do it).
How will you know what's going on? See the explain plan.
But, generally speaking, yes - indexes help.
Q1 - If so, why should I create an index with those two columns?
Which "two columns"? A? If it is a primary key column, Oracle automatically creates an index, you don't have to do that.
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If you are talking about a composite index (containing both B and C columns, respectively), and if query uses B column, then yes - index will (OK, might be used). But, if query uses only column C, then this index will be completely useless.
In spite of this question being answered and one answer being accepted already, I'll just throw in some more information :-)
An index is an offer to the DBMS that it can use to access data quicker in some situations. Whether it actually uses the index is a decision made by the DBMS.
Oracle has a built-in optimizer that looks at the query and tries to find the best execution plan to get the results you are after.
Let's say that 90% of all rows have B = 30 AND C = 99. Why then should Oracle laboriously walk through the index only to have to access almost every row in the table at last? So, even with an index on both columns, Oracle may decide not to use the index at all and even perform the query faster because of the decision against the index.
Now to the questions:
If I create an index only using the Column B, this will already improve my query right?
It may. If Oracle thinks that B = 30 reduces the rows it will have to read from the table imensely, it will.
If so, why should I create an index with those two columns?
If the combination of B = 30 AND C = 99 limits the rows to read from the table further, it's a good idea to use this index instead.
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If the index is on (B, C), i.e. B first, then Oracle may find it useful, yes. In the extreme case that there are only the two columns in the table, that would even be a covering index (i.e. containing all columns accessed in the query) and the DBMS wouldn't have to read any table row, as all the information is already in the index itself. If the index is (C, B), i.e. C first, it is quite unlikely that the index would be used. In some edge-case situations, Oracle might do so, though.

SQL Server range indexing ideas

I need help understanding how to create proper indexing on a table for fast range selects.
I have a table with the following columns:
Column --- Type
frameidx --- int
u --- int
v --- int
x --- float(53)
y --- float(53)
z --- float(53)
None of these columns is unique.
There are to be approximately 30 million records in this table.
An average query would look something like this:
Select x, y, z from tablename
Where
frameidx = 4 AND
u between 34 AND 500
v between 0 AND 200
Pretty straight forward, no joins, no nested stuff. Just good ol' subset selection.
What sort of indexing should I do in MS SQL Server (2012) for this table in order to be able to fetch records (which can be in the thousands from this query) in (ideally) less than a 100ms, for example?
Thanks.
If you don't have indices, SQL Server needs to scan the whole table to find the required data. For such a big table (30M rows), that's time consuming.
If you have indices appropriate for your query, the SQL server will seek them (i.e. it will quickly find the required rows in the index, using the index structure). The index consists of the indexed column values, in the given index order, and pointers to the rows in the indexed table, so once the data is found in the index, the necessary data from the indexed table is recovered using those pointers.
SO, if you want to speed up thing, you need to create indexes for the columns which you're going to use to filter the ranges.
Adding indexes will improve the query response time, but will also take up more space, and make the insertions slower. So you shouldn't create a lot of indexes.
If you're going to use all the columns for filtering all the time, you should make only one index. And, ideally, that index should be the more selective, i.e. the one that has the most different values (the least number of repeated values). Only one index can be used for each query.
If you're going to use different sets of range filters, you should create more indexes.
Using a composite can be good or bad. In a composite key, the rows are ordered by all of the columns in the index. So, provided you index by A, B, C & D, filtering or ordering by A will give consecutive rows of the index, and it's a quick operation. And filtering by A, B, C & D, is ideal for this index. However, filtering or ordering only by D, is the worst case for this index, because it will need to recover data spread all over the index: remember that the data is ordered by A, then B, then C, then D, so the D info is spread all over the index. Depending on several factors (table stats, index selectivity, and so on), it's even possible that no index is used at all, and the table is scanned.
A final note on the clustered index: a clustered index defines the physical order in which the data is stored in the table. It doesn't need to be unique. If you're using one of the columns for filtering most of the times, it's a good idea to make that the table's clustered index, because, in this case, instead of seeking an index and finding the data in the indexed table using pointers, the table is sought directly, and that can improve performance.
So there is no simple answer, but I hope to know you have info to improve your query speed.
EDIT
Corrected info, according to a very interesting comment.

How does SQL Server treat Included columns in a nonclustered index?

I have a question:
Definition of non-clustered index says that included columns in index are not counted by database engine in sense of index size or maximum number of columns.
So what is really the way they work?
How they help to SQL Server when they are not acting in index size?
The important thing to note is that included columns are not counted by the database engine when determining the size or number of columns in the index key (the value used to actually look up data in the index structure). They still add to the size of the index itself.
Index keys are only allowed to be 900 bytes in size across all columns that make up the key (there can be only 16 columns that make up the index key).
Adding included columns doesn't count towards the 900 byte/16 column limits, but can make the index more useful by covering more queries.
Good explanations from the other people here.
For me, included index columns are rather easy to remember and use with this simple rule:
Filters, ie. WHERE x = y etc.., are your keys, the decision whether to use the index or not is based on those. SELECT a, b, x are the values you're actually returning, those are the things you want to include in your index so SQL Server doesn't have to go searching through the clustered index / heap to find them.
Example:
CREATE NONCLUSTERED INDEX TABLEX_A_IDX ON TABLEX (A) INCLUDE (B, C)
SELECT A, B, C -- KEY + INCLUDED columns
FROM TABLEX WHERE A = 'ASD' -- KEY columns
Granted, this wasn't exactly your question, but it might help just the same.

Is an index on A, B redundant if there is an index on A, B, C?

Having years of experience as a DBA, I do believe I know the answer to the question, but I figured it never hurts to check my bases.
Using SQL Server, assuming I have a table which has an index on column A and column B, and a second index on columns A, B, and C, would it be safe to drop the first index, as the second index basically would satisfy queries that would benefit from the first index?
It depends, but the answer is often 'Yes, you could drop the index on (A,B)'.
The counter-case (where you would not drop the index on (A,B)) is when the index on (A,B) is a unique index that is enforcing a constraint; then you do not want to drop the index on (A,B). The index on (A,B,C) could also be unique, but the uniqueness is redundant because the (A,B) combination is unique because of the other index.
But in the absence of such unusual cases (for example, if both (A,B) and (A,B,C) allow duplicate entries), then the (A,B) index is logically redundant. However, if the column C is 'wide' (a CHAR(100) column perhaps), whereas A and B are small (say INTEGER), then the (A,B) index is more efficient than the (A,B,C) index because you can get more information read per page of the (A,B) index. So, even though (A,B) is redundant, it may be worth keeping. You also need to consider the volatility of the table; if the table seldom changes, the extra indexes don't matter much; if the table changes a lot, extra indexes slow up modifications to the table. Whether that's significant is difficult to guess; you probably need to do the performance measurements.
The first index covers queries that look up on A , A,B and the second index can be used to cover queries that look up on A , A,B or A,B,C which is clearly a superset of the first case.
If C is very wide however the index on A,B may still be useful as it can satisfy certain queries with fewer reads.
e.g. if C was a char(800) column the following query may benefit significantly from having the narrower index available.
SELECT a,b
FROM YourTable
ORDER BY a,b
Yes, this is a common optimization. Any query that would benefit from the index on A,B can also benefit just as well from the index on A,B,C.
In the MySQL community, there's even a tool to search your whole schema for redundant indexes: http://www.percona.com/doc/percona-toolkit/pt-duplicate-key-checker.html
The possible exception case would be if the index on A,B were more compact and used much more frequently, and you wanted to control which index was kept loaded in memory.
Much of what I was thinking was written by Jonathan in a previous answer. Uniqueness, faster work, and one other thing I think he missed.
If the first index is made A desc, B asc and second A asc, B asc, C asc, then deleting the first index isn't really a way to go, because the second one isn't a superset of the first one, and your query cannot benefit from the second index if ordering is as written in the first one.
In some cases like when you use the first index, you can order by A desc, B asc (of course) and A asc, B desc, but you can also make a query that will use any part of that index, like Order by A desc.
But a query like order by A asc, B asc, will not be 'covered' by the first index.
So I would add up, you can usually delete the first index, but that depends on your table configuration and your query (and, of course, indexes).
I typically would find this "almost" similar index in table that contains historical data. If column C is a date or integer column, be careful. It is most likely used to satisfy the MAX function as in WHERE tblA.C = MAX(tblB.C), which skips the table altogether and utilize an index only access path.