SQL Server index included columns - sql

I need help understanding how to create indexes. I have a table that looks like this
Id
Name
Age
Location
Education,
PhoneNumber
My query looks like this:
SELECT *
FROM table1
WHERE name = 'sam'
What's the correct way to create an index for this with included columns?
What if the query has a order by statement?
SELECT *
FROM table1
WHERE name = 'sam'
ORDER BY id DESC
What if I have 2 parameters in my where statement?
SELECT *
FROM table1
WHERE name = 'sam'
AND age > 12

The correct way to create an index with included columns? Either via Management Studio/Toad/etc, or SQL (documentation):
CREATE INDEX idx_table_1 ON db.table_1 (name) INCLUDE (id)
What if the Query has an ORDER BY
The ORDER BY can use indexes, if the optimizer sees fit to (determined by table statistics & query). It's up to you to test if a composite index or an index with INCLUDE columns works best by reviewing the query cost.
If id is the clustered key (not always the primary key though), I probably wouldn't INCLUDE the column...
What if I have 2 parameters in my where statement?
Same as above - you need to test what works best for your query. Might be composite, or include, or separate indexes.
But keep in mind that:
tweaking for one query won't necessarily benefit every other query
indexes do slow down INSERT/UPDATE/DELETE statements, and require maintenance
You can use the Database Tuning Advisor (DTA) for index recommendations, including when some are redundant
Recommended reading
I highly recommend reading Kimberly Tripp's "The Tipping Point" for a better understanding of index decisions and impacts.

Since I do not know which exactly tasks your DB is going to implement and how many records in it, I would suggest that you take a look at the Index Basics MSDN article. It will allow you to decide yourself which indexes to create.

If ID is your primary and/or clustered index key, just create an index on Name, Age. This will cover all three queries.
Included fields are best used to retrieve row-level values for columns that are not in the filter list, or to retrieve aggregate values where the sorted field is in the GROUP BY clause.

If inserts are rare, create as much indexes as You want.
For first query create index for name column.
Id column I think already is primary key...
Create 2nd index with name and age. You can keep only one index: 'name, ag'e and it will not be much slower for 1st query.

Related

SQL Index - are both statements going to do the same?

I was wondering if in SQL server these two statements to create a non-clustered index will have the same behavior?
create nonclustered index EmpLastname_Incl_Firstname
on employee(lastname) include (firstname);
create nonclustered index EmpLastnameFirstname
on employee(lastname, firstname)
No. The key columns are optimized for things like filtering and grouping, while the included columns are optimized for retrieval of the column only. So if a lot of your queries look like the following:
SELECT firstname, lastname
FROM mytable
WHERE lastname = 'Doe' AND firstname = 'John'
then the second index you showed would be preferred. If you only use lastname in your SELECT such as the following query:
SELECT firstname, lastname
FROM mytable
WHERE lastname = 'Doe'
Then the first query would be preferred.
If you have a mix of both queries you should take the second index only as the second query is also able to make use of the first index.
absolutely no
INCLUDE means that the data from the column is stored in the index but it is not part of the index sorting
Those statements will not have the same behavior. The index with the include will only allow key lookups on the lastname field, while the index without the include will allow key lookups on both the lastname and firstname fields. Microsoft documentation for indexes with includes. This bit is especially important to your question:
Redesign nonclustered indexes with a large index key size so that only columns used for searching and lookups are key columns. Make all other columns that cover the query into nonkey columns. In this way, you will have all columns needed to cover the query, but the index key itself is small and efficient.
If you ever need to search by the firstname field, your index should include it as a key lookup.
Adding columns to include will store the respective data only on the leaf-node level of the b-tree (not in the tree itself).
Almost everything that can be accomplished with include can also be accomplished by putting the respective columns in the key part of the index. The exceptions are related to the length limits of the key. In doubt, it might be best to leave it in the key columns.
Having that said, there are some benefits when putting a column in include rather than the key part:
the resulting index is slightly smaller (a few percent)
The tree of the index might be a one level smaller
It is documented what the column of that index is used for. That makes extending this index more easy in the future.
I find the last one the most important one.
Have a look at my recent article about this topic for a better understanding:
https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes

SQL Server - Indexing and Operator precedence

I'm having table Student as below
Student(id,jdate)
where column id is primary key. Now I'm writing a query as below
select * from Student where id=2 and (jdate='date1' or jdate='date2')
Will index work here? or Can i modify as below?
select * from Student where (id=2) and (jdate='date1' or jdate='date2')
Both your examples will use the PK Index for column 'id'.
In case it is not clear, The operator "=" has precedence over "and", and thus the parenthesis are not necessary.
Since you are declaring a PK on the id column then you are defining a unique clustered index on the table as well. And since you are using the id column in the where clause then the index should be used.
The two queries, both of them, will use the index and the parenthesis around id = 2 don't change anything in the logic / condition evaluation.
Yes, both queries will work and will both hit any relevant clustered or non clustered index.
Given that id is your table PK, you probably won't even hit any index on jDate. (i.e. although at first glance the index (id, jdate) seems useful, in practice it will be redundant given that id is the PK and queries targetting a single id will either use the Clustered Index (if the default PK clustering is used), or the PK Constraint itself (if the table has different clustering).
Although the spurious parenthesis around id = 2 will be ignored, obviously and has precedence over or, so the parenthesis surrounding the or is essential:
... and (... or ...)
As other users said - both queries are the same. And PK index will be used. If you have any doubts about which index is used (in this or other queries) see execution plan: (http://technet.microsoft.com/en-us/library/ms178071%28v=sql.105%29.aspx)
Execution plan is a very useful tool. For example may prompt the missing indexes

SQL non-clustered index

I have a table that maps a user's permissions to a given object. So, it is essentially a join table to 3 different tables. (Object, User, and Permission)
The values of each row will always be unique for all 3 columns, but not any 2.
I need to create a non-clustered index. I want to put the index on the foreign keys to the object and user, but I am wondering if I should put it on all 3 columns.
"The values of each row will always be unique for all 3 columns"
You might be interested to know that SQL Server unique constraints are implemented as indexes. So if you have (or want) a constraint backing up that unique-claim of yours, you already have an index on all 3.
CREATE UNIQUE NONCLUSTERED INDEX idx_unique_perms ON UserPermissions
(
ObjectId ASC,
UserId ASC,
PermissionID ASC
)
If you make one, just remember to order your columns for high selectivity.
If you have some doubts, formulate the query(ies) you intend to execute against these tables, and run the SSMS Query Tuning Wizard. That should help you get started in the right direction.
One thing to consider is the number of rows in these three tables. If the row counts will be small, it might not even be worthwhile adding indexes. A table scan would probably be done anyway.

Adding fields to optimize MySQL queries

I have a MySQL table with 3 fields:
Location
Variable
Value
I frequently use the following query:
SELECT *
FROM Table
WHERE Location = '$Location'
AND Variable = '$Variable'
ORDER BY Location, Variable
I have over a million rows in my table and queries are somewhat slow. Would it increase query speed if I added a field VariableLocation, which is the Variable and the Location combined? I would be able to change the query to:
SELECT *
FROM Table
WHERE VariableLocation = '$Location$Variable'
ORDER BY VariableLocation
I would add a covering index, for columns location and variable:
ALTER TABLE
ADD INDEX (variable, location);
...though if the variable & location pairs are unique, they should be the primary key.
Combining the columns will likely cause more grief than it's worth. For example, if you need to pull out records by location or variable only, you'd have to substring the values in a subquery.
Try adding an index which covers the two fields you should then still get a performance boost but also keep your data understandable because it wouldn't seem like the two columns should be combine but you are just doing it to get performance.
I would advise against combining the fields. Instead, create an index that covers both fields in the same order as your ORDER BY clause:
ALTER TABLE tablename ADD INDEX (location, variable);
Combined indices and keys are only used in queries that involve all fields of the index or a subset of these fields read from left to right. Or in other words: If you use location in a WHERE condition, this index would be used, but ordering by variable would not use the index.
When trying to optimize queries, the EXPLAIN command is quite helpful: EXPLAIN in mysql docs
Correction Update:
Courtesy: #paxdiablo:
A column in the table will make no difference. All you need is an index over both columns and the MySQL engine will use that. Adding a column in the table is actually worse than that since it breaks 3NF and wastes space. See http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html which states: SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2; If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly.

What is a Covered Index?

I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/