How to create Index for this scenario in SQL Server? - sql

What is the best Index to this Item table for this following query
select
tt.itemlookupcode,
tt.TotalQuantity,
tt.ExtendedPrice,
tt.ExtendedCost,
items.ExtendedDescription,
items.SubDescription1,
dept.Name,
categories.Name,
sup.Code,
sup.SupplierName
from
#temp_tt tt
left join HQMatajer.dbo.Item items
on items.ItemLookupCode=tt.itemlookupcode
left join HQMatajer.dbo.Department dept
ON dept.ID=items.DepartmentID
left join HQMatajer.dbo.Category categories
on categories.ID=items.CategoryID
left join HQMatajer.dbo.Supplier sup
ON sup.ID=items.SupplierID
drop table #temp_tt
I created Index like
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[DBTimeStamp] ASC,
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[Description],
[SubDescription1]
)
But in Execution plan when I check the index which picked another index. That index having only TimeStamp column.
What is the best index for this scenario to that particular table.

First column in index should be part of filtration else Index will not be used. In your index first column is DBTimeStamp and it is not filtered in your query. That is the reason your index is not used.
Also in covering index you have used [Description],[SubDescription1] but in query you have selected ExtendedDescription,items.SubDescription1 this will have additional overhead of key/Rid lookup
Try alerting your index like this
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[ExtendedDescription],
[SubDescription1]
)
Having said that all still optimizer go for scan or choose some other index based on data retrieved from Item table

I'm not surprised your index isn't used. DBTimeStamp is likely to be highly selective, and is not referenced in your query at all.
You might have forgotten to include an ORDER BY clause in your query which was intended reference DBTimeStamp. But even then your query would probably need to scan the entire index. So it may as well scan the actual table.
The only way to make that index 'look enticing' would be to ensure it includes all columns that are used/returned. I.e. You'd need to add ExtendedDescription. The reason this can help is that indexes typically require less storage than the full table. So it's faster to read from disk. But if you're missing columns (in your case ExtendedDescription), then the engine needs to perform an additional lookup onto the full table in any case.
I can't comment why the DBTimeStamp column is preferred - you haven't given enough detail. But perhaps it's the CLUSTERED index?
Your index would be almost certain to be used if defined as:
(
[ItemLookupCode] ASC --The only thing you're actually filtering by
)
INCLUDE (
/* Moving the rest to include is most efficient for the index tree.
And by including ALL used columns, there's no need to perform
extra lookups to the full table.
*/
[DepartmentID],
[CategoryID],
[SupplierID],
[ExtendedDescription],
[SubDescription1]
)
Note however, that this kind of indexing strategy 'Find the best for each query used' is unsustainable.
You're better off finding 'narrower' indexes that are appropriate multiple queries.
Every index slows down INSERT and UPDATE queries.
And indexes like this are impacted by more columns than the preferred 'narrower' indexes.
Index choice should focus on the selectivity of columns. I.e. Given a specific value or small range of values, what percentage of data is likely to be selected based on your queries?
In your case, I'd expect ItemLookupCode to be unique per item in the Items table. In other words indexing by that without any includes should be sufficient. However, since you're joining to a temp table that theoretically could include all item codes: in some cases it might be better to scan the CLUSTERED INDEX in any case.

Related

MS SQL: Performance for querying ID descending

This question relates to a table in Microsoft SQL Server which is usually queried with ORDER BY Id DESC.
Would there be a performance benefit from setting the primary key to PRIMARY KEY CLUSTERED (Id DESC)? Or would there be a need for an index? Or is it as fast as it gets without any of it?
Table:
CREATE TABLE [dbo].[Items] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[Category] INT NOT NULL,
[Name] NVARCHAR(255) NULL,
CONSTRAINT [PK_Items] PRIMARY KEY CLUSTERED ([Id] ASC)
)
Query:
SELECT TOP 1 * FROM [dbo].[Items]
WHERE Catgory = 123
ORDER BY [Id] DESC
Would there be a performance benefit from setting the primary key to PRIMARY KEY
CLUSTERED (Id DESC)?
Given as you show is: IT DEPENDS.
The filter is on Category = 123. To find all entries of Category 123, because there is NO INDEX defined, the server has to do a table scan. Unless you havea VERY large result set, and / or some awfully comically bad configured tempdb and very low memory (because disc is only used when running out of memory for tempdb) the sorting of hte result will be irrelevant compared to the table scan.
You are literally following the wrong tail. You are WAY more likely to speed up the query by adding a non-unique index to Cateogory so that the query can prefilter the data fast based on your query condition.
If you would analzy the query plan for this query (which you should - technically we should not even ANSWER this quesstion without you showing SOME effort, and a look at the query plan is like the FIRST thing you do) you would very likely see that the time is spent on on the query, NOT the result sort.
Creating an index in asc or desc order does not make a big difference in “ORDER BY” when there is only one column, but when there is a need to sort data in two different directions one column in ascending order and the other column in descending order the way the index is created does make a big difference.
Look this article that do many example:
https://www.mssqltips.com/sqlservertip/1337/building-sql-server-indexes-in-ascending-vs-descending-order/
In your scenario I advise you to create an index on Category Column without include “Id” because the clustered index is always included in non-clustered index.
There is no difference according to the following
I'd suggest defining an index on (category, id desc).
It will give you best performance for your query.
As others have indicated, an index on Category (assuming you don't have one) is the biggest performance boost possible here.
But as for your actual question. For a single order by query like you have, it does not matter if the query/index is ordered by desc or asc as far as performance goes. SQL Server can swap those easily (starting a the beginning or the end of the data structure)
Where performance becomes an issue for performance is when you:
Have more than order by column
Your index has more than one column
Your order by is opposing the order on the index.
So, say your Primary Key had ID asc and Category asc, and then you query by ID asc and Category desc. Then SQL Server can't use the order on the index to do the search.
There are a few caveats and gotchas. After searching a bit, this answer seems to have them listed:
SQL Server indexes - ascending or descending, what difference does it make?

If I have a single nonclustered index on a table, will the number of columns I include change the slow down when writing to it?

On the exact same table, if I was to put one index on it, either:
CREATE INDEX ix_single ON MyTable (uid asc) include (columnone)
or:
CREATE INDEX ix_multi ON MyTable (uid asc) include (
columnone,
columntwo,
columnthree,
....
columnX
)
Would the second index cause an even greater lag on how long it takes to write to the table than the first one? And why?
Included columns will need more diskspace as well as time on data manipulation...
If there is a clustered index on this table too (ideally on a implicitly sorted column like an IDENTITY column to avoid fragmentation) this will serve as fast lookup on all columns (but you must create the clustered index before the other one...)
To include columns into an index is a usefull approach in extremely performance related issues only...

On what basis indexes are selected

I have a question: I have a clustered index on orderid,productid and a non-clustered index on productid. When I am using the following query, it uses the nonclustered index seek on productid, which I expected:
select orderid, productid
from Sales.OrderDetails
where productid =1
order by productid
However, without changing the search arguments, I added the Quantity:
select orderid, productid, qty
from Sales.OrderDetails
where productid =1
order by productid
Now it used a clustered index scan; and when I force use non clustered index (productid) the performance drops.
Every nonclustered index will contain the clustering key in its leaf level nodes. So your nonclustered index on productid really contains productid and orderid in its leaf level nodes.
So your first query can be satisfied by just looking up the value in the nonclustered index - the leaf level node found will contain both columns that your SELECT requires.
This is NOT the case when you add another column, like qty - now, once found, a key lookup into the actual data page is necessary to get all the columns to be returned from your SELECT query. So therefore, maybe now a clustered index scan is performing better than a nonclustered index seek and a key lookup.
I'm pretty sure the second query would use the nonclustered index again once you include the qty column in your index:
CREATE NONCLUSTERED INDEX ix_productId
ON Sales.OrderDetails(productId)
INCLUDE (qty)
because again: now once the leaf-level page in the non-clustered index is found, all necessary columns are present on that page and can be returned to your second query.
It makes sense, when you added the qty column using the clustered index is ideal because the non index (data) columns are located near the clustered indexed attribs.
Index: "The right index will help optimizer finding the best execution plan"
Selectivity: Density of an Index = 1 divided by the no of distinct records.Lower the value of density, higher the chances for Index to be choosen by optimizer, there by higher SELECTIVITY. For every Index, there will be a statistics also, With the following DBCC command, we can find the selectivity of a Statistics and there by the Index.
DBCC SHOW_STATISTICS('table name','statistics name')
It also displays, the Density, of the column when included with Clustered index columns. Lower selectivity will make optimizer ignore the index. We may find low performance query, even after creating index on the search column. The reason could be low selectivity.
Eg:
SELECT sod.OrderQty ,
sod.SalesOrderID ,
sod.SalesOrderDetailID ,
sod.LineTotal
FROM Sales.SalesOrderDetail sod
WHERE sod.OrderQty = 10;
When a where condition is involved, the chances for a seek is high.But in the above query, the execution plan is showing a clustered index Scan. created a non clusterd index on OrderQty, still optimizer is ignoring the index. The reason is, Order quantity density is 1/41 distinct values = ~.25, With such a low selectivity, the optmizser finds it comparable with index scan itself.
Outdated STATISTICS could really make OPTIMIZER ignore the index.
There are different things that a DBA can do to help optimizer find the best execution plan. One amoung those are QUERY HINTS. Introducing Query hits could help, at the same time could be affect the performance, as well. An Index related Query hint is WITH(INDEX())
We can tell optimizer to use a specific index.
eg: SELECT * FROM table WITH (INDEX (0)) -- Can give index number
SELECT * FROM table WITH (INDEX (indexName)) --
These things may help You

Optimizing my SQL queries - picking the right indexes

I have a basic table as follows.
create table Orders
(
ID INT IDENTITY(1,1) PRIMARY KEY,
Company VARCHAR(3),
ItemID INT,
BoxID INT,
OrderNum VARCHAR(5),
Status VARCHAR(5),
--about 10 more columns, varchars and ints and dates
)
I'm trying to optimize all my SQL since I am getting a fair few deadlocks and some slowness - but I'm no expert on this sort of thing!
I created a few indexes:
Clustered on the ID (Primary Key).
Non-Clustered index on ([ItemID])
Non-Clustered index on ([BoxID])
Non-Clustered index on ([Company],[OrderNum],[Status])
Maybe 1 or 2 more on some other columns
But I'm not 100% happy with the results.
SELECT * FROM Orders WHERE ItemID=100
Gives me an index seek + a key lookup and a Nested loop (Inner join).
I can see why - but don't know if I should do anything about it. They key lookup is 97% of the batch which seems bad!
Every query used will pull back every column in the table, but I don't like the idea of including every column in the index.
I'm making a change now to query everything on the [Company] field. Every query will be using it, because results should never contain more than 1 value. So they will all change:
SELECT * FROM Orders WHERE ItemID=100 --Old
SELECT * FROM Orders WHERE Company='a' and ItemID=100 --New
But the execution plan of that gives me exactly the same as not including company (which does surprise me!).
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Is it worth adding [Company] to all my indexes since it seems to make
0 different to the execution plan?
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that
mean every query will have 2 seeks?
Is it worth 'including' all other columns in my indexes to avoid the
key lookup? (making the index a tonne bigger, but potentially
speeding it up?) i.e.
CREATE NONCLUSTERED INDEX [IX_Orders_MyIndex] ON [Orders]
( [Company] ASC, [OrderNum] ASC, [Status] ASC )
INCLUDE ([ID],[ItemID],[BoxID],
[Column5],[Column6],[Column7],[Column8],[Column9],[Column10],etc)
That seems messy if I did it on 4 or 5 indexes.
Basically I have 4-5 queries which run quite often (some selects and updates) so I want to make it as efficient as possible.
All queries will use the [company] field, and at least 1 other. How should I go about it.
Any help appreciated :)
In your execution plan, you say that lookup takes 97% of the batch.
In this case it doesn't mean anything because an index seek is very fast and you didn't have that much operation to be done.
That lookup is actually the record you read based on the index you have specified.
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Non-Clustered index on ([Company],[OrderNum],[Status])
This index will be considered only if Company, OrderNum and Status appear in your where clause.
Concatenated indexes generates a key that would look like this 0000000000000 when you pass only company it creates an incomplete key that requires using wildcard for the other to values.
It would look a little like this : key like 'XXX%' this logic will require an index scan which is time consuming.
The optimizer will determine that it's preferable to first seek and rows from the ItemID index and then scan these to match any with the required company.
Is it worth adding [Company] to all my indexes since it seems to make 0 different to the execution plan?
You should consider having a Company index instead of adding it to all your indexes.
Composite index could speed things up by reducing the number of nested loops, but you have to think then thoroughly.
The order of the fields you add to such an index is very important, they should be ordered by uniqueness to allow a better seek. Also, you should never add a field that might not be used in a query.
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that mean every query will have 2 seeks?
Having more than one index seek is not all that bad, they are usually paralleled and only the result of both are matched together.
Is it worth 'including' all other columns in my indexes to avoid the key lookup? (making the index a tonne bigger, but potentially speeding it up?)
It is worth when it's only a few fields that could be optional in the where clause or when you have queries that select only those fields when you are using the specified index.
Last notes
All indexes are not equal, comparing string (varchar) is not the same as comparing numbers (integer, datetime, bytes, etc).
Also, keeping them clean helps a lot, if your indexes are fragmented, they will be next to useless in terms of performance gain.

What does `INCLUDE` do in an index?

What does INCLUDE in an unclustered index?
CREATE NONCLUSTERED INDEX [MyIndex] ON [dbo].[Individual]
(
[IndivID] ASC
)
INCLUDE ( [LastName], [FirstName])
WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
I know the first part is used for the WHERE clause, but what do the INCLUDE columns do? What's the benefit of having them "added to the leaf level of the nonclustered index"?
edit Also if i already have a clustered PK index for IndivID, why does Tuning Advisor recommend this index?
INCLUDE columns include the associated fields WITH the index. They are not used FOR indexing, but they are placed in the leaf node of the B-tree that makes up the index.
In essence: The index is still ON [IndivID] and [IndivID] alone. However, if your query only needs a subset of [IndivID], [LastName] and [FirstName], SQL doesn't need to go back to the table after it's found the [IndivID] it's searching for in the Index.
SEE: Covering Index
EDIT: B-tree assumes MS SQL Server. I'm not positive other implementations use the same data structure
Tuning Advisor (Speculation):: A clustered index places the entire data row at the leaf node of the index's B-tree, and this takes up a lot of space. If the Tuning Advisor sees that you're never accessing more than those three fields ([IndivID] + INCLUDEs), it will attempt to save you space (and insert/update time) by downgrading it to a non-clustered index with the only "important" fields present.
INCLUDE adds those fields at the leaf-level of the index. Basically the bt-ree is not sorted by those fields, but once the index finds the row with the indexed field(s) it's looking for, it also has the other fields immediately.
If you use the phone book analogy, the INCLUDED fields in the phone book index (which is sorted by Lastname, Firstname) would be Phone Number and Address - you can't look up a person by those fields but once you have their name you can find them.
CLUSTERED indexes have all fields included already by design, so INCLUDE is invalid in a CLUSTER. You also shouldn't bother INCLUDEing the clustered field in a non-clustered index since it is already implicitly there as the row key.
I most often use the INCLUDE fields for aggregation. For instance, if I have an index on CalendarDate and CustomerID I can include PaidAmt and get
MAX(PAidAmt) Where CustomerId = x AND CalendarDate = 1/1/2011
At the most basic level they are used to avoid a bookmark or key lookup.
That is data that is included as payload in the index. It won't be used to filter, but it can be returned.
If you for example have a query that filters on age and return name:
select name
from persons
where age = 42
Then you could create an index for the age field, with the name field included. That way the database could use only the index to run the entire query, and doesn't have to read anything at all from the actual table.
From MSDN - CREATE INDEX (Transact-SQL):
INCLUDE (column [ ,... n ] )
Specifies the non-key columns to be added to the leaf level of the nonclustered index.
Meaning, you can add more columns to the unclustered index - if you are returning several fields every time you query on the key column, adding them to the index will improve performance as they are stored with it, aka a covering index.