How does SQL Server treat Included columns in a nonclustered index? - sql

I have a question:
Definition of non-clustered index says that included columns in index are not counted by database engine in sense of index size or maximum number of columns.
So what is really the way they work?
How they help to SQL Server when they are not acting in index size?

The important thing to note is that included columns are not counted by the database engine when determining the size or number of columns in the index key (the value used to actually look up data in the index structure). They still add to the size of the index itself.
Index keys are only allowed to be 900 bytes in size across all columns that make up the key (there can be only 16 columns that make up the index key).
Adding included columns doesn't count towards the 900 byte/16 column limits, but can make the index more useful by covering more queries.

Good explanations from the other people here.
For me, included index columns are rather easy to remember and use with this simple rule:
Filters, ie. WHERE x = y etc.., are your keys, the decision whether to use the index or not is based on those. SELECT a, b, x are the values you're actually returning, those are the things you want to include in your index so SQL Server doesn't have to go searching through the clustered index / heap to find them.
Example:
CREATE NONCLUSTERED INDEX TABLEX_A_IDX ON TABLEX (A) INCLUDE (B, C)
SELECT A, B, C -- KEY + INCLUDED columns
FROM TABLEX WHERE A = 'ASD' -- KEY columns
Granted, this wasn't exactly your question, but it might help just the same.

Related

Does PostgreSQL use all available indexes to run a query faster?

We are structuring a project where some tables will have many records, and we intend to use 4 numeric foreign keys and 1 numeric primary, our assumption is that if we create an index for each foreign key and the default index of the primary key, the postgres planning would use all the starts (5 in total) to perform the query.
95% of the time the queries would be providing at least the 4 foreign keys.
Would each index be used to position the search faster in the sequential section of records?
Would having 4 indexes increase the speed of the query or would it suffice with a single index of the parent level (branch_id)?
Thank you for your time and experience.
example: if all foreign keys have an index
SELECT * FROM products WHERE
account_d=1 AND
organization_id=2 AND
business_id=3 AND
branch_id=4 AND
product_id=5;
example: if I only indicate the id of the primary key
SELECT * FROM products WHERE product_id=5;
If all 4 columns are specified by equality, it is possible to combine the single-column indexes using BitmapAnd. However, this would be less efficient than using one multi-column index on all four columns.
Since that will apparently be a very common query, it would make sense to have that multi-column index.
Usually you will want to index each foreign key column. Otherwise, if you want to delete an organization, for example, it would need to scan the whole table to verify that no records were still referencing it. Whichever column is the first one in the multi-column index will not need to also have a single-column index for it. But the other 3 which are not first probably still need their own indexes.
Indexes are (predominantly) used when filtering or joining tables, so whether the indexes you are proposing are useful is entirely dependent on the SQL you are running and whether the query optimiser determines that using an index would be beneficial.
For example, if you ran SELECT * FROM TABLE then none of the indexes would be used.
I can’t comment on Postgresql specifically but many/most DBMSs automatically create indexes when you define PKs/FKs - so you will get the indexes anyway, regardless of any performance tuning you are trying to implement
Update
Having individual indexes on each column is not going to help with the query you’ve provided, the optimiser will only use one of them, probably the PK. A compound index on multiple columns would help, but the more columns you add to the index, the more restrictive the pattern of queries it will benefit.
Say you have 3 columns A, B, C and include them all in WHERE clause, then having a compound index of A+B+C would be highly beneficial.
If you keep this index but your WHERE clause only has columns A, B it will still benefit significantly as the query can still use the A+B subset of the index.
If your WHERE clause only has columns A,C then it would benefit only slightly, as it would select all records from the index that start with the A value - but then would have to filter them to find the subset with the C value

Oracle multiple vs single column index

Imagine I have a table with the following columns:
Column: A (numer(10)) (PK)
Column: B (numer(10))
Column: C (numer(10))
CREATE TABLE schema_name.table_name (
column_a number(10) primary_key,
column_b number(10) ,
column_c number(10)
);
Column A is my PK.
Imagine my application now has a flow that queries by B and C. Something like:
SELECT * FROM SCHEMA.TABLE WHERE B=30 AND C=99
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
Q1 - If so, why should I create an index with those two columns?
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
The simple answers to your questions.
For this query:
SELECT *
FROM SCHEMA.TABLE
WHERE B = 30 AND C = 99;
The optimal index either (B, C) or (C, B). The order does matter because the two comparisons are =.
An index on either column can be used, but all the matching values will need to be scanned to compare to the second value.
If you have an index on (B, C), then this can be used for a query on WHERE B = 30. Oracle also implements a skip-scan optimization, so it is possible that the index could also be used for WHERE C = 99 -- but it probably would not be.
I think the documentation for MySQL has a good introduction to multi-column indexes. It doesn't cover the skip-scan but is otherwise quite applicable to Oracle.
Short answer: always check the real performance, not theoretical. It means, that my answer requires verification at real database.
Inside SQL (Oracle, Postgre, MsSql, etc.) the Primary Key is used for at least two purposes:
Ordering of rows (e.g. if PK is incremented only then all values will be appended)
Link to row. It means that if you have any extra index, it will contain whole PK to have ability to jump from additional index to other rows.
If I create an index only using the Column B, this will already improve my query right?
The strategy behind this query would benefit from the index on column B?
It depends. If your table is too small, Oracle can do just full scan of it. For large table Oracle can (and will do in common scenario) use index for column B and next do range scan. In this case Oracle check all values with B=30. Therefore, if you can only one row with B=30 then you can achieve good performance. If you have millions of such rows, Oracle will need to do million of reads. Oracle can get this information via statistic.
Q1 - If so, why should I create an index with those two columns?
It is needed to direct access to row. In this case Oracle requires just few jumps to find your row. Moreover, you can apply unique modifier to help Oracle. Then it will know, that not more than single row will be returned.
However if your table has other columns, real execution plan will include access to PK (to retrieve other rows).
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
Yes. Please check the details here. If index have several columns, than Oracle will sort them according to column ordering. E.g. if you create index with columns B, C then Oracle will able to use it to retrieve values like "B=30", e.g. when you restricted only B.
Well, it all depends.
If that table is tiny, you won't see any benefit regardless any indexes you might create - it is just too small and Oracle returns data immediately.
If the table is huge, then it depends on column's selectivity. There's no guarantee that Oracle will ever use that index. If optimizer decides (upon information it has - don't forget to regularly collect statistics!) that the index should not be used, then you created it in vain (though, you can choose to use a hint, but - unless you know what you're doing, don't do it).
How will you know what's going on? See the explain plan.
But, generally speaking, yes - indexes help.
Q1 - If so, why should I create an index with those two columns?
Which "two columns"? A? If it is a primary key column, Oracle automatically creates an index, you don't have to do that.
Q2 - If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If you are talking about a composite index (containing both B and C columns, respectively), and if query uses B column, then yes - index will (OK, might be used). But, if query uses only column C, then this index will be completely useless.
In spite of this question being answered and one answer being accepted already, I'll just throw in some more information :-)
An index is an offer to the DBMS that it can use to access data quicker in some situations. Whether it actually uses the index is a decision made by the DBMS.
Oracle has a built-in optimizer that looks at the query and tries to find the best execution plan to get the results you are after.
Let's say that 90% of all rows have B = 30 AND C = 99. Why then should Oracle laboriously walk through the index only to have to access almost every row in the table at last? So, even with an index on both columns, Oracle may decide not to use the index at all and even perform the query faster because of the decision against the index.
Now to the questions:
If I create an index only using the Column B, this will already improve my query right?
It may. If Oracle thinks that B = 30 reduces the rows it will have to read from the table imensely, it will.
If so, why should I create an index with those two columns?
If the combination of B = 30 AND C = 99 limits the rows to read from the table further, it's a good idea to use this index instead.
If I decided to create an index with B and C, If I query selecting only B, would this one be affected by the index?
If the index is on (B, C), i.e. B first, then Oracle may find it useful, yes. In the extreme case that there are only the two columns in the table, that would even be a covering index (i.e. containing all columns accessed in the query) and the DBMS wouldn't have to read any table row, as all the information is already in the index itself. If the index is (C, B), i.e. C first, it is quite unlikely that the index would be used. In some edge-case situations, Oracle might do so, though.

SQL Server range indexing ideas

I need help understanding how to create proper indexing on a table for fast range selects.
I have a table with the following columns:
Column --- Type
frameidx --- int
u --- int
v --- int
x --- float(53)
y --- float(53)
z --- float(53)
None of these columns is unique.
There are to be approximately 30 million records in this table.
An average query would look something like this:
Select x, y, z from tablename
Where
frameidx = 4 AND
u between 34 AND 500
v between 0 AND 200
Pretty straight forward, no joins, no nested stuff. Just good ol' subset selection.
What sort of indexing should I do in MS SQL Server (2012) for this table in order to be able to fetch records (which can be in the thousands from this query) in (ideally) less than a 100ms, for example?
Thanks.
If you don't have indices, SQL Server needs to scan the whole table to find the required data. For such a big table (30M rows), that's time consuming.
If you have indices appropriate for your query, the SQL server will seek them (i.e. it will quickly find the required rows in the index, using the index structure). The index consists of the indexed column values, in the given index order, and pointers to the rows in the indexed table, so once the data is found in the index, the necessary data from the indexed table is recovered using those pointers.
SO, if you want to speed up thing, you need to create indexes for the columns which you're going to use to filter the ranges.
Adding indexes will improve the query response time, but will also take up more space, and make the insertions slower. So you shouldn't create a lot of indexes.
If you're going to use all the columns for filtering all the time, you should make only one index. And, ideally, that index should be the more selective, i.e. the one that has the most different values (the least number of repeated values). Only one index can be used for each query.
If you're going to use different sets of range filters, you should create more indexes.
Using a composite can be good or bad. In a composite key, the rows are ordered by all of the columns in the index. So, provided you index by A, B, C & D, filtering or ordering by A will give consecutive rows of the index, and it's a quick operation. And filtering by A, B, C & D, is ideal for this index. However, filtering or ordering only by D, is the worst case for this index, because it will need to recover data spread all over the index: remember that the data is ordered by A, then B, then C, then D, so the D info is spread all over the index. Depending on several factors (table stats, index selectivity, and so on), it's even possible that no index is used at all, and the table is scanned.
A final note on the clustered index: a clustered index defines the physical order in which the data is stored in the table. It doesn't need to be unique. If you're using one of the columns for filtering most of the times, it's a good idea to make that the table's clustered index, because, in this case, instead of seeking an index and finding the data in the indexed table using pointers, the table is sought directly, and that can improve performance.
So there is no simple answer, but I hope to know you have info to improve your query speed.
EDIT
Corrected info, according to a very interesting comment.

SQL Server multiple index order optimization

I have a table with a nonclustered index1 on ID1 and ID2, in that order.
Select count(distinct(id1)) from table
returns 1
and Select count(distinct(id2)) from table has all the values of the table.
The querys to that table uses ... where id1= XX and id2 = XX
Could it make any performance improvement if I switch the order of the fields of index1 ?
I know it SHOULD be better but maybe: is it indifferent because id1 has only 1 value?
If I understand correctly, you are comparing these two statements:
where id1= XX and id2 = XX
Under most circumstances, this would use either an index on table(id1, id2) or table(id2, id1). The order of the comparisons in the where (or on) clauses has no impact on which indexes can be used.
Whether you should include a column that has only a single value in the unique index is a different matter. There is a minor performance effect to having a more complex index -- the tree structure has to store more bytes for each key. However, the query:
select count(distinct id2)
from table
where id1 = xx and idx = xx
will actually run faster with a composite index than with a singleton index table(id2). The reason is that the composite index can be used to entirely satisfy the query (in the jargon, it is a "covering index for the query"). The singleton index would need to look up the value of id1 in the table data, which requires extra processing.
The order you define the columns in your Index matters. If your column ID1 will always only have 1 value, then there is no point in putting it into the index, unless you are using it in a Covering Index in a Non-Clustered Index (meaning an Index not the physical ordering of the Table itself). In general, your first column defined in your Index should be the column with the most Varying Values that you need to search through. Visualize it this way, if you had a table of 1 million rows, and the first Column in your Index only had 1 (or small number) of varying values, then would that Index help you in finding the rows you want among the 1 million? Or would it be better to have ID2 first, which would be more efficient for the search, and which would be more frequently used, is what you have to ask yourself. Below is also more info on your question.
SQL Server Clustered Index - Order of Index Question
If you are using a Non-Clustered index, it may appear to not make a Different if your first Column in your Index is all the same values. However it does matter, the reason being is a Non-Clustered Index is stored on a number of Pages. The more entries you can store on a Page which helps you search faster the better. If you include a Column on a Page which adds no value to the Search, then it will requires the same Index to span more Pages. Meaning more Pages to flip through and Longer Lookups. It also means less Room to add new entries to an Existing Page during Inserts when the index is updated, causing more Page Splits. So there are side effects to the decision to add a Column of only 1 value to the Index. If you are using the Column to "cover" retrieved values in common selects, then you can also use Included Columns in your Index, which has the added benefit of not reordering your Index and yet acts like a Covered Index. If that was the intended purpose originally for adding a Column which only has 1 value.

is a db index composite by default?

when I create an index on a db2, for example with the following code:
CREATE INDEX T_IDX ON T(
A,
B)
is it a composite index?
if not: how can I then create a composite index?
if yes: in order to have two different index should I create them separately as:
CREATE INDEX T1_IDX ON T(A)
CREATE INDEX T2_IDX ON T(A)
EDIT: this discussion is not going in the direction I expect (but in a better one :)) I actually asked how, and not why to create separate indexes, I planed to do that in a different question, but since you anticipated me:
suppose I have a table T(A,B,C) and a search function search() that select from the table using any of the following method
WHERE A = x
WHERE B = x
WHERE C = x
WHERE A = x AND B=y (and so on AC, CB, ABC)
if I create a compose index ABC, is it going to working for example when I select on just C?
the table is quite big, and the insert\update not so frequent
Yep multiple fields on create index = composite by definition: Specify two or more column names to create a composite index.
Understanding when to use composite indexes appears to be your last question...
If all columns selected by a query are in a composite index, then the dbengine can return these values from the index without accessing the table. so you have faster seek time.
However if one or the other are used in queries, then creating individual indexes will serve you best. It depends on the types of queries executed and what values they contain/filter/join.
If you sometimes have one, the other, or both, then creating all 3 indexes is a possibility as well. But keep in mind each additional index increases the amount of time it takes to insert, update or delete, so on highly maintained tables, more indexes are generally bad since the overhead to maintain the indexes effects performance.
The index on A, B is a composite index, and can be used to seek on just A or a seek on A with B or for a general scan, of course.
There is usually not much of a point in having an index on A, B and an index on just A, since a partial search on A, B can be used if you only have A. That wider index will be a little less efficient, however, so if the A lookup is extremely frequent and the write requirements mean that it is acceptable to update the extra index, it could be justifiable.
Having an index on B may be necessary, since the A, B index is not very suitable for searches based on B only.
First Answer: YES
CREATE INDEX JOB_BY_DPT
ON EMPLOYEE (WORKDEPT, JOB)
Second Answer:
It depends on your query; if most of the time your query referrence a single column in where clause like select * from T where A = 'something' then a single index would be what you want but if both column A and B get referrenced then you should go for creating a composite one.
For further referrence please check
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/admin/r0000919.htm