When to use composite indexes? - sql

What are the general rules in regards to using composite indexes? When should you use them, and when should you avoid them?

Composite indexes are useful when your SELECT queries use those columns frequently as criteria in your WHERE clauses. It improves retrieval speed. You should avoid them if they are not necessary.
This article provides some really good information.

A query that selects only a few fields can run completely on an index. For example, if you have an index on (OrderId) this query would require a table lookup:
select Status from Orders where OrderId = 42
But if you add a composite index on (OrderId,Status) the engine can retrieve all information it needs from the index.
A sort on multiple columns can benefit from a composite index. For example, an index on (LastName, FirstName) would benefit this query:
select * from Orders order by LastName, FirsName
Sometimes you have a unique constrant on multiple columns. Say for example that you restart order numbers every day. Then OrderNumber is not unique, but (OrderNumber, OrderDayOfYear) is. You can enforce that with a unique composite index.
I'm sure there are many more uses for a composite index, just listing a few examples.

Related

Does PostgreSQL use all available indexes to run a query faster?

We are structuring a project where some tables will have many records, and we intend to use 4 numeric foreign keys and 1 numeric primary, our assumption is that if we create an index for each foreign key and the default index of the primary key, the postgres planning would use all the starts (5 in total) to perform the query.
95% of the time the queries would be providing at least the 4 foreign keys.
Would each index be used to position the search faster in the sequential section of records?
Would having 4 indexes increase the speed of the query or would it suffice with a single index of the parent level (branch_id)?
Thank you for your time and experience.
example: if all foreign keys have an index
SELECT * FROM products WHERE
account_d=1 AND
organization_id=2 AND
business_id=3 AND
branch_id=4 AND
product_id=5;
example: if I only indicate the id of the primary key
SELECT * FROM products WHERE product_id=5;
If all 4 columns are specified by equality, it is possible to combine the single-column indexes using BitmapAnd. However, this would be less efficient than using one multi-column index on all four columns.
Since that will apparently be a very common query, it would make sense to have that multi-column index.
Usually you will want to index each foreign key column. Otherwise, if you want to delete an organization, for example, it would need to scan the whole table to verify that no records were still referencing it. Whichever column is the first one in the multi-column index will not need to also have a single-column index for it. But the other 3 which are not first probably still need their own indexes.
Indexes are (predominantly) used when filtering or joining tables, so whether the indexes you are proposing are useful is entirely dependent on the SQL you are running and whether the query optimiser determines that using an index would be beneficial.
For example, if you ran SELECT * FROM TABLE then none of the indexes would be used.
I can’t comment on Postgresql specifically but many/most DBMSs automatically create indexes when you define PKs/FKs - so you will get the indexes anyway, regardless of any performance tuning you are trying to implement
Update
Having individual indexes on each column is not going to help with the query you’ve provided, the optimiser will only use one of them, probably the PK. A compound index on multiple columns would help, but the more columns you add to the index, the more restrictive the pattern of queries it will benefit.
Say you have 3 columns A, B, C and include them all in WHERE clause, then having a compound index of A+B+C would be highly beneficial.
If you keep this index but your WHERE clause only has columns A, B it will still benefit significantly as the query can still use the A+B subset of the index.
If your WHERE clause only has columns A,C then it would benefit only slightly, as it would select all records from the index that start with the A value - but then would have to filter them to find the subset with the C value

SQL Index - are both statements going to do the same?

I was wondering if in SQL server these two statements to create a non-clustered index will have the same behavior?
create nonclustered index EmpLastname_Incl_Firstname
on employee(lastname) include (firstname);
create nonclustered index EmpLastnameFirstname
on employee(lastname, firstname)
No. The key columns are optimized for things like filtering and grouping, while the included columns are optimized for retrieval of the column only. So if a lot of your queries look like the following:
SELECT firstname, lastname
FROM mytable
WHERE lastname = 'Doe' AND firstname = 'John'
then the second index you showed would be preferred. If you only use lastname in your SELECT such as the following query:
SELECT firstname, lastname
FROM mytable
WHERE lastname = 'Doe'
Then the first query would be preferred.
If you have a mix of both queries you should take the second index only as the second query is also able to make use of the first index.
absolutely no
INCLUDE means that the data from the column is stored in the index but it is not part of the index sorting
Those statements will not have the same behavior. The index with the include will only allow key lookups on the lastname field, while the index without the include will allow key lookups on both the lastname and firstname fields. Microsoft documentation for indexes with includes. This bit is especially important to your question:
Redesign nonclustered indexes with a large index key size so that only columns used for searching and lookups are key columns. Make all other columns that cover the query into nonkey columns. In this way, you will have all columns needed to cover the query, but the index key itself is small and efficient.
If you ever need to search by the firstname field, your index should include it as a key lookup.
Adding columns to include will store the respective data only on the leaf-node level of the b-tree (not in the tree itself).
Almost everything that can be accomplished with include can also be accomplished by putting the respective columns in the key part of the index. The exceptions are related to the length limits of the key. In doubt, it might be best to leave it in the key columns.
Having that said, there are some benefits when putting a column in include rather than the key part:
the resulting index is slightly smaller (a few percent)
The tree of the index might be a one level smaller
It is documented what the column of that index is used for. That makes extending this index more easy in the future.
I find the last one the most important one.
Have a look at my recent article about this topic for a better understanding:
https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes

Is the addition of a second ID column beneficial to index?

Let's say I have a table tbl_FacilityOrders with two foreign keys fk_FacilityID and fk_OrderID in SQL Server 2005. It could contain orders from a few hundred facilities. I need to query single records and will have both the facilityID and the orderID available to me. Is it better to define an index on fk_FacilityID then fk_OrderID and pass the both to the query or to just use fk_OrderID. Since there will be less facility IDs than order IDs, I could see weeding out the other facilities' records first possibly being beneficial.
A second question is, if I were using the two columnn query above, does the order I write my WHERE clause columns in matter or is is the engine smart enough to evaluate them in the order of the index?
E.G. Is:
WHERE fk_facilityID = #FacilityID AND fk_OrderID = #OrderID
equivalent to:
WHERE fk_OrderID = #OrderID AND fk_FacilityID = #FacilityID
?
Is it better to define an index on fk_FacilityID then fk_OrderID and pass the both to the query or to just use fk_OrderID.
If OrderId is unique, there's no real added benefit to adding the other field for the scenario given. It is a good idea to index your FKs, though, since they will always been a JOIN key.
if I were using the two columnn query above, does the order I write my WHERE clause columns in matter or is is the engine smart enough to evaluate them in the order of the index?
Nope, order is irrelevant here. All that matters is that the SETS of fields match, i.e. FieldA and FieldB are both in the index and in the WHERE clause.
The order of fields in the index DOES matter, though. You can't use the second field in an index without knowing the value of the first field.
You should create an index for each of your foreign keys... not just the purpose of this question, but because indexing your foreign keys is good practice in general.
To answer your second question, the two statements are equivalent. SQL Server should internally re-order the statements to arrive at the optimal execution plan... however, you should always validate the generated execution plan just to make sure that its behaving as you would expect.

SQL Server index included columns

I need help understanding how to create indexes. I have a table that looks like this
Id
Name
Age
Location
Education,
PhoneNumber
My query looks like this:
SELECT *
FROM table1
WHERE name = 'sam'
What's the correct way to create an index for this with included columns?
What if the query has a order by statement?
SELECT *
FROM table1
WHERE name = 'sam'
ORDER BY id DESC
What if I have 2 parameters in my where statement?
SELECT *
FROM table1
WHERE name = 'sam'
AND age > 12
The correct way to create an index with included columns? Either via Management Studio/Toad/etc, or SQL (documentation):
CREATE INDEX idx_table_1 ON db.table_1 (name) INCLUDE (id)
What if the Query has an ORDER BY
The ORDER BY can use indexes, if the optimizer sees fit to (determined by table statistics & query). It's up to you to test if a composite index or an index with INCLUDE columns works best by reviewing the query cost.
If id is the clustered key (not always the primary key though), I probably wouldn't INCLUDE the column...
What if I have 2 parameters in my where statement?
Same as above - you need to test what works best for your query. Might be composite, or include, or separate indexes.
But keep in mind that:
tweaking for one query won't necessarily benefit every other query
indexes do slow down INSERT/UPDATE/DELETE statements, and require maintenance
You can use the Database Tuning Advisor (DTA) for index recommendations, including when some are redundant
Recommended reading
I highly recommend reading Kimberly Tripp's "The Tipping Point" for a better understanding of index decisions and impacts.
Since I do not know which exactly tasks your DB is going to implement and how many records in it, I would suggest that you take a look at the Index Basics MSDN article. It will allow you to decide yourself which indexes to create.
If ID is your primary and/or clustered index key, just create an index on Name, Age. This will cover all three queries.
Included fields are best used to retrieve row-level values for columns that are not in the filter list, or to retrieve aggregate values where the sorted field is in the GROUP BY clause.
If inserts are rare, create as much indexes as You want.
For first query create index for name column.
Id column I think already is primary key...
Create 2nd index with name and age. You can keep only one index: 'name, ag'e and it will not be much slower for 1st query.

Composite database indexes

I'm looking for confirmation of my understanding of composite indexes in databases - specifically in relation to SQL Server 2008 R2, if that makes a difference.
I think I understand that the order of the columns of the index is crucial in that if I have an index of { [Name], [Date] }, then a SELECT based on a WHERE clause based on [Date] won't be able to use the index, but an index of { [Date], [Name] } would. If the SELECT is based on both columns, either index would be usable.
Is that right? What are the benefits of using a composite index like this, over two indexes on each column (i.e. { [Date] }, and { [Name] }).
Thanks!
Not quite, a selection on date could still use the index but not as effective as a query including name as name would limit how much of the index has to be searched.
If you often have queries on name + date and date and name seperate, use 3 indexes one for each combo.
Also having the most varied field first in an index also faster limits the index seach amound making it faster.
You can also have included columns, data thats not indexed but that is ofter fetched based on the index.
That is correct.
A composite index is useful when the combined selectivity of the composite columns prunes the result set effectively.
If you add 'INCLUDED' columns to an index (composite or non-composite), you can create a 'covering' index to cover a query (or queries), which is desireable as it removes the need to perform a second lookup to obtain those columns (from the clustered index).
The choice of two single column indexes OR a composite index of the combined columns is determined by the total query workload against that table.