Why are my SQL indexes being ignored? - sql

We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead. We can force the use of indexes by using the WITH (INDEX=<index_name>) clause but would prefer not to have to do this.
As a developer I'm very familiar with SQL Server when writing T-SQL, but profiling and performance tuning isn't my strong point. I'm looking for any advice and guidance as to why this might be happening.
Update:
I should have said that we've rebuilt all indexes and updated index statistics.
The table definition for one of the culprits is as follows:
CREATE TABLE [tblinvoices]
(
[CustomerID] [int] NOT NULL,
[InvoiceNo] [int] NOT NULL,
[InvoiceDate] [smalldatetime] NOT NULL,
[InvoiceTotal] [numeric](18, 2) NOT NULL,
[AmountPaid] [numeric](18, 2) NULL
CONSTRAINT [DF_tblinvoices_AmountPaid] DEFAULT (0),
[DateEntered] [smalldatetime] NULL
CONSTRAINT [DF_tblinvoices_DateEntered] DEFAULT (getdate()),
[PaymentRef] [varchar](110),
[PaymentType] [varchar](10),
[SyncStatus] [int] NULL,
[PeriodStart] [smalldatetime] NULL,
[DateIssued] [smalldatetime] NULL
CONSTRAINT [DF_tblinvoices_dateissued] DEFAULT (getdate()),
CONSTRAINT [PK_tblinvoices] PRIMARY KEY NONCLUSTERED
(
[InvoiceNo] ASC
) ON [PRIMARY]
) ON [PRIMARY]
There is one other index on this table (the one we want SQL to use):
CustomerID (Non-Unique, Non-Clustered)
The following query performs a table scan instead of using the CustomerID index:
SELECT
CustomerID,
Sum(InvoiceTotal) AS SumOfInvoiceTotal,
Sum(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = 2112
GROUP BY customerID
Updated:
In answer to Autocracy's question, both of those queries perform a table scan.
Updated:
In answer to Quassnoi's question about DBCC SHOW_STATISTICS, the data is:
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
1667 246 454 8 27.33333
2112 911 3427 16 56.9375
2133 914 775 16 57.125

The best thing to do is make the index a covering index by including the InvoiceTotal and AmountPaid columns in the CustomerID index. (In SQL 2005, you would add them as "included" columns". In SQL 2000, you have to add them as additional key columns.) If you do that, I'll guarantee the query optimizer will choose your index*.
Explanation:
Indexes seem like they would always be useful, but there is a hidden cost to using a (non-covering) index, and that is the "bookmark lookup" that has to be done to retrieve any other columns that might be needed from the main table. This bookmark lookup is an expensive operation, and is (one possible) reason why the query optimizer might not choose to use your index.
By including all needed columns in the index itself, this bookmark lookup is avoided entirely, and the optimizer doesn't have to play this little game of figuring out if using an index is "worth it".
(*) Or I'll refund your StackOverflow points. Just send a self-addressed, stamped envelope to...
Edit: Yes, if your primary key is NOT a clustered index, then by all means, do that, too!! But even with that change, making your CustomerID index a covering index should increase performance by an order of magnitude (10x or better)!!

We're having a problem where indexes on our tables are being ignored and SQL Server 2000 is performing table scans instead.
Despite 4,302 days that have passed since Aug 29, 1997, SQL Server's optimizer has not evolved into SkyNet yet, and it still can make some incorrect decisions.
Index hints are just the way you, a human being, help the artificial intelligence.
If you are sure that you collected statistics and the optimizer is still wrong, then go on, use the hints.
They are legitimate, correct, documented and supported by Microsoft way to enforce the query plan you want.
In your case:
SELECT CustomerID,
SUM(InvoiceTotal) AS SumOfInvoiceTotal,
SUM(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = 2112
GROUP BY
CustomerID
, the optimizer has two choises:
Use the index which implies a nested loop over the index along with KEY LOOKUP to fetch the values of InvoiceTotal and AmountPaid
Do not use the index and scan all tables rows, which is faster in rows fetched per second, but longer in terms of total row count.
The first method may or may not be faster than the second one.
The optimizer tries to estimate which method is faster by looking into the statistics, which keep the index selectivity along with other values.
For selective indexes, the former method is faster; for non-selective ones, the latter is.
Could you please run this query:
SELECT 1 - CAST(COUNT(NULLIF(CustomerID, 2112)) AS FLOAT) / COUNT(*)
FROM tlbInvoices
Update:
Since CustomerID = 2112 covers only 1,4% of your rows, you should benefit from using the index.
Now, could you please run the following query:
DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
, locate two adjacents rows in the third resultset with RANGE_HI_KEY being less and more than 2112, and post the rows here?
Update 2:
Since the statistics seem to be correct, we can only guess why the optimizer chooses full table scan in this case.
Probably (probably) this is because this very value (2112) occurs in the RANGE_HI_KEY and the optimizer sees that it's unusually dense (3427 values for 2112 alone against only 911 for the whole range from 1668 to 2111)
Could you please do two more things:
Run this query:
DBCC SHOW_STATISTICS ([tblinvoices], [CustomerID])
and post the first two resultsets.
Run this query:
SELECT TOP 1 CustomerID, COUNT(*)
FROM tblinvoices
WHERE CustomerID BETWEEN 1668 AND 2111
, use the top CustomerID from the query above in your original query:
SELECT CustomerID,
SUM(InvoiceTotal) AS SumOfInvoiceTotal,
SUM(AmountPaid) AS SumOfAmountPaid
FROM tblInvoices
WHERE CustomerID = #Top_Customer
GROUP BY
CustomerID
and see what plan will it generate.

The most common reasons for indexes to be ignored are
Columns involved are not selective enough (optimiser decides tables scans will be faster, due to 'visiting' a large amount of rows)
There are a large number of columns involved in SELECT/GROUP BY/ORDER BY and would involve a lookup into the clustered index after using the index
Statistics being out of date (or skewed by a large number of inserts or deletes)
Do you have a regular index maintenance job running? (it is quite common for it to be missing in Dev environment).

Latest post from Kimberly covers exactly this topic: http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx
SQL Server uses a cost based optimizer and if the optimizer calculates that the cost of looking up the index keys and then look up the clustered index to retrieve the rest of the columns is higher than the cost of scanning the table, then it will scan the table instead. The 'tipping' point is actually surprisingly low.

Have you tried adding the other columns to your index? i.e. InvoiceTotal and AmountPaid.
The idea being that the query will be "covered" by the index, and won't have to refer back to the table.

I would start testing to see if you can change the primary key to a clustered index. Right now the table is considered a "heap". If you can't do this then I would also consider creating a view with a clustered index but first you'd have to change the "AmountPaid" column to NOT NULL. It already defaults to zero so this might be an easy change. For the view I'd try something similar to this.
SET QUOTED_IDENTIFIER, ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON
GO
SET NUMERIC_ROUNDABORT OFF
GO
IF EXISTS
(
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = N'CustomerInvoiceSummary'
)
DROP VIEW dbo.CustomerInvoiceSummary
GO
CREATE VIEW dbo.CustomerInvoiceSummary WITH SCHEMABINDING
AS
SELECT a.CustomerID
, Sum(a.InvoiceTotal) AS SumOfInvoiceTotal
, Sum(a.AmountPaid) AS SumOfAmountPaid
, COUNT_BIG(*) AS CT
FROM dbo.tblInvoices a
GROUP BY a.CustomerID
GO
CREATE UNIQUE CLUSTERED INDEX CustomerInvoiceSummary_CLI ON dbo.CustomerInvoiceSummary ( CustomerID )
GO

I think I just found it. I was reading the comments posted to your question before I noted that the two queries I gave you were expected to cause table scan, and I just wanted the result. That said, it caught my interest when somebody said you had no clustered indexes. I read your SQL create statement in detail, and was surprised to note that was the case. This is why it isn't using your CustomerId index.
Your CustomerId index references your primary key of InvoiceNo. Your primary key, however, isn't clustered, so then you'd have to look in that index to find where the row actually is. The SQL server won't do two non-clustered index lookups to find a row. It'll just table scan.
Make your InvoiceNo a clustered index. We can assume those will generally be inserted in ascending manner, and thus the insertion cost won't be much higher. Your query cost, however, will be much lower. Dollars to donuts, it'll use your index then.
Edit: I like BradC's suggestion as well. It's a common DBA trick. Like he says, though, make that primary clustered anyway since this is the CAUSE of your problem. It is very rare to have a table with no clustered index. Most of the time it isn't used, it's a bad idea. That said, his covering index is an improvement ON TOP OF clustering that should be done.

Several others have pointed out that your database may need the index statistics updated. You may also have such a high percentage of rows in the database that it would be faster to sequentially read the table than to seek across the disk to find every one. SQL Server has a fancy GUI query analyzer that will tell you what the database thinks the cost of various activiites is. You can open that up and see exactly what it was thinking.
We can give you more solid answers if you can give us:
Select * from tblinvoices;
Select * from tblinvoices where CustomerID = 2112;
Use that query analyzer, and update your statistics. One last hint: you can use index hints to force it to use your index if you're sure it's just being stupid after you've done everything else.

Have you tried
exec sp_recompile tblInvoices
...just to make sure you're not using a cached bad plan?

You could also try doing an UPDATE STATISTICS on the table (or tables) involved in the query. Not that I fully understand statistics in SQL, but I do know it is something our DBAs do occasionally (with an weekly job being scheduled to update stats on the larger and frequently changed tables).
SQL Statistics

Try updating your statistics. These statistics are the basis for the decisions made by the compiler about whether it should use an index or not. They contain information such as cardinality and number of rows for each table.
E.g., if the statistics have not been updated since you did a large bulk import, the compiler may still think the table has only 10 rows in it, and not bother with an index.

Are you using "SELECT * FROM ..."? This generally results in scans.
We'd need schema, indexes and sample queries to help more

Related

MS SQL: Performance for querying ID descending

This question relates to a table in Microsoft SQL Server which is usually queried with ORDER BY Id DESC.
Would there be a performance benefit from setting the primary key to PRIMARY KEY CLUSTERED (Id DESC)? Or would there be a need for an index? Or is it as fast as it gets without any of it?
Table:
CREATE TABLE [dbo].[Items] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[Category] INT NOT NULL,
[Name] NVARCHAR(255) NULL,
CONSTRAINT [PK_Items] PRIMARY KEY CLUSTERED ([Id] ASC)
)
Query:
SELECT TOP 1 * FROM [dbo].[Items]
WHERE Catgory = 123
ORDER BY [Id] DESC
Would there be a performance benefit from setting the primary key to PRIMARY KEY
CLUSTERED (Id DESC)?
Given as you show is: IT DEPENDS.
The filter is on Category = 123. To find all entries of Category 123, because there is NO INDEX defined, the server has to do a table scan. Unless you havea VERY large result set, and / or some awfully comically bad configured tempdb and very low memory (because disc is only used when running out of memory for tempdb) the sorting of hte result will be irrelevant compared to the table scan.
You are literally following the wrong tail. You are WAY more likely to speed up the query by adding a non-unique index to Cateogory so that the query can prefilter the data fast based on your query condition.
If you would analzy the query plan for this query (which you should - technically we should not even ANSWER this quesstion without you showing SOME effort, and a look at the query plan is like the FIRST thing you do) you would very likely see that the time is spent on on the query, NOT the result sort.
Creating an index in asc or desc order does not make a big difference in “ORDER BY” when there is only one column, but when there is a need to sort data in two different directions one column in ascending order and the other column in descending order the way the index is created does make a big difference.
Look this article that do many example:
https://www.mssqltips.com/sqlservertip/1337/building-sql-server-indexes-in-ascending-vs-descending-order/
In your scenario I advise you to create an index on Category Column without include “Id” because the clustered index is always included in non-clustered index.
There is no difference according to the following
I'd suggest defining an index on (category, id desc).
It will give you best performance for your query.
As others have indicated, an index on Category (assuming you don't have one) is the biggest performance boost possible here.
But as for your actual question. For a single order by query like you have, it does not matter if the query/index is ordered by desc or asc as far as performance goes. SQL Server can swap those easily (starting a the beginning or the end of the data structure)
Where performance becomes an issue for performance is when you:
Have more than order by column
Your index has more than one column
Your order by is opposing the order on the index.
So, say your Primary Key had ID asc and Category asc, and then you query by ID asc and Category desc. Then SQL Server can't use the order on the index to do the search.
There are a few caveats and gotchas. After searching a bit, this answer seems to have them listed:
SQL Server indexes - ascending or descending, what difference does it make?

SQL index for date range query

For a few days, I've been struggling with improving the performance of my database and there are some issues that I'm still kind a confused about regarding indexing in a SQL Server database.
I'll try to be as informative as I can.
My database currently contains about 100k rows and will keep growing, therfore I'm trying to find a way to make it work faster.
I'm also writing to this table, so if you suggestion will drastically reduce the writing time please let me know.
Overall goal is to select all rows with a specific names that are in a date range.
That will usually be to select over 3,000 rows out of a lot lol ...
Table schema:
CREATE TABLE [dbo].[reports]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[IsDuplicate] [bit] NOT NULL,
[IsNotValid] [bit] NOT NULL,
[Time] [datetime] NOT NULL,
[ShortDate] [date] NOT NULL,
[Source] [nvarchar](350) NULL,
[Email] [nvarchar](350) NULL,
CONSTRAINT [PK_dbo.reports]
PRIMARY KEY CLUSTERED ([id] ASC)
) ON [PRIMARY]
This is the SQL query I'm using:
SELECT *
FROM [db].[dbo].[reports]
WHERE Source = 'name1'
AND ShortDate BETWEEN '2017-10-13' AND '2017-10-15'
As I understood, my best approach to improve efficency without hurting the writing time as much would be to create a nonclustered index on the Source and ShortDate.
Which I did like such, index schema:
CREATE NONCLUSTERED INDEX [Source&Time]
ON [dbo].[reports]([Source] ASC, [ShortDate] ASC)
Now we are getting to the tricky part which got me completely lost, the index above sometimes works, sometime half works and sometime doesn't work at all....
(not sure if it matters but currently 90% of the database rows has the same Source, although this won't stay like that for long)
With the query below, the index isn't used at all, I'm using SQL Server 2014 and in the Execution Plan it says it only uses the clustered index scan:
SELECT *
FROM [db].[dbo].[reports]
WHERE Source = 'name1'
AND ShortDate BETWEEN '2017-10-10' AND '2017-10-15'
With this query, the index isn't used at all, although I'm getting a suggestion from SQL Server to create an index with the date first and source second... I read that the index should be made by the order the query is? Also it says to include all the columns Im selecting, is that a must?... again I read that I should include in the index only the columns I'm searching.
SELECT *
FROM [db].[dbo].[reports]
WHERE Source = 'name1'
AND ShortDate = '2017-10-13'
SQL Server index suggestion -
/* The Query Processor estimates that implementing the following
index could improve the query cost by 86.2728%. */
/*
USE [db]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[reports] ([ShortDate], [Source])
INCLUDE ([id], [IsDuplicate], [IsNotValid], [Time], [Email])
GO
*/
Now I tried using the index SQL Server suggested me to make and it works, seems like it uses 100% of the nonclustered index using both the queries above.
I tried to use this index but deleting the included columns and it doesn't work... seems like I must include in the index all the columns I'm selecting?
BTW it also work when using the index I made if I include all the columns.
To summarize: seems like the order of the index didn't matter, as it worked both when creating Source + ShortDate and ShortDate + Source
But for some reason its a must to include all the columns... (which will drastically affect the writing to this table?)
Thanks a lot for reading, My goal is to understand why this stuff happens and what I should do otherwise (not just the solution as I'll need to apply it on other projects as well ).
Cheers :)
Indexing in SQL Server is part know-how from long experience (and many hours of frustration), and part black magic. Don't beat yourself up over that too much - that's what a place like SO is ideal for - lots of brains, lots of experience from many hours of optimizing, that you can tap into.
I read that the index should be made by the order the query is?
If you read this - it is absolutely NOT TRUE - the order of the columns is relevant - but in a different way: a compound index (made up from multiple columns) will only ever be considered if you specify the n left-most columns in the index definition in your query.
Classic example: a phone book with an index on (city, lastname, firstname). Such an index might be used:
in a query that specifies all three columns in its WHERE clause
in a query that uses city and lastname (find all "Miller" in "Detroit")
or in a query that only filters by city
but it can NEVER EVER be used if you want to search only for firstname ..... that's the trick about compound indexes you need to be aware of. But if you always use all columns from an index, their ordering is typically not really relevant - the query optimizer will handle this for you.
As for the included columns - those are stored only in the leaf level of the nonclustered index - they are NOT part of the search structure of the index, and you cannot specify filter values for those included columns in your WHERE clause.
The main benefit of these included columns is this: if you search in a nonclustered index, and in the end, you actually find the value you're looking for - what do you have available at that point? The nonclustered index will store the columns in the non-clustered index definition (ShortDate and Source), and it will store the clustering key (if you have one - and you should!) - but nothing else.
So in this case, once a match is found, and your query wants everything from that table, SQL Server has to do what is called a Key lookup (often also referred to as a bookmark lookup) in which it takes the clustered key and then does a Seek operation against the clustered index, to get to the actual data page that contains all the values you're looking for.
If you have included columns in your index, then the leaf level page of your non-clustered index contains
the columns as defined in the nonclustered index
the clustering key column(s)
all those additional columns as defined in your INCLUDE statement
If those columns "cover" your query, e.g. provide all the values that your query needs, then SQL Server is done once it finds the value you searched for in the nonclustered index - it can take all the values it needs from that leaf-level page of the nonclustered index, and it does NOT need to do another (expensive) key lookup into the clustering index to get the actual values.
Because of this, trying to always explicitly specify only those columns you really need in your SELECT can be beneficial - in this case, you might be able to create an efficient covering index that provides all the values for your SELECT - always using SELECT * makes that really hard or next to impossible.....
In general, you want the index to be from most selective (i.e. filtering out the most possible records) to least selective; if a column has low cardinality, the query optimizer may ignore it.
That makes intuitive sense - if you have a phone book, and you're looking for people called "smith", with the initial "A", you want to start with searching for "smith" first, and then the "A"s, rather than all people whose initial is "A" and then filter out those called "Smith". After all, the odds are that one in 26 people have the initial "A".
So, in your example, I guess you have a wide range of values in short date - so that's the first column the query optimizer is trying to filter out. You say you have few different values in "source", so the query optimizer may decide to ignore it; in that case, the second column in that index is no use either.
The order of where clauses in the index is irrelevant - you can swap them round and achieve the exact same results, so the query optimizer ignores them.
EDIT:
So, yes, make the index. Imagine you have a pile of cards to sort - in your first run, you want to remove as many cards as possible. Assuming it's all evenly spread - if you have 1000 separate short_dates over a million rows, that means you end up with 1000 items if your first run starts on short_date; if you sort by source, you have 100000 rows.
The included columns of an index is for the columns you are selecting.
Due to the fact that you do select * (which isn't good practice), the index won't be used, because it would have to lookup the whole table to get the values for the columns.
For your scenario, I would drop the default clustered index (if there is one) and create a new clustered index with the following statement:
USE [db]
GO
CREATE CLUSTERED INDEX CIX_reports
ON [dbo].[reports] ([ShortDate],[Source])
GO

Index for table in SQL Server 2012

I had a question on indexes. I have a table like this:
id BIGINT PRIMARY KEY NOT NULL,
cust_id VARCHAR(8) NOT NULL,
dt DATE NOT NULL,
sale_type VARCHAR(10) NOT NULL,
sale_type_sub VARCHAR(40),
amount DOUBLE PRECISION NOT NULL
The table has several million rows. Assuming that queries will often filter results by date ranges, sale types, amounts above and below certain values, and that joins will occur on cust_id... what do you all think is the ideal index structure?
I wasn't sure if a clustered index would be best, or individual indexes on each column? Both?
Any serious table in SQL Server should always have a well-chosen, good clustering key - it makes so many things faster and more efficient. From you table structure, I'd use the ID as the clustering key.
Next, you say joins occur on cust_id - so I would put an index on cust_id. This speeds up joins in general and is a generally accepted recommendation.
Next, it really depends on your queries. Are they all using the same columns in their WHERE clauses? Or do you get queries that use dt, and others that use sale_type separately?
The point is: the fewer indices the better - so if ever possible, I'd try to find one compound index that covers all your needs. But if you have an index on three columns (e.g. on (sale_type, dt, amount), then that index can be used for queries
using all three columns in the WHERE clause
using sale_type and dt in the WHERE clause
using only sale_type in the WHERE clause
but it could NOT be used for queries that use dt or amount alone. A compound index always requires you to use the n left-most columns in the index definition - otherwise it cannot be used.
So my recommendation would be:
define the clustering key on ID
define a nonclustered index on cust_id for the JOINs
examine your system to see what other queries you have - what criteria is being used for selection, how often do those queries execute? Don't over-optimize a query that's executed once a month - but do spend time on those that are executed dozens of times every hour.
Add one index at a time - let the system run for a bit - do you measure an improvement in query times? Does it feel faster? If so: leave that index. If not: drop it again. Iterate until you're happy with the overall system performance.
The best way to find indexes for your table is sql server profiler.

Optimizing my SQL queries - picking the right indexes

I have a basic table as follows.
create table Orders
(
ID INT IDENTITY(1,1) PRIMARY KEY,
Company VARCHAR(3),
ItemID INT,
BoxID INT,
OrderNum VARCHAR(5),
Status VARCHAR(5),
--about 10 more columns, varchars and ints and dates
)
I'm trying to optimize all my SQL since I am getting a fair few deadlocks and some slowness - but I'm no expert on this sort of thing!
I created a few indexes:
Clustered on the ID (Primary Key).
Non-Clustered index on ([ItemID])
Non-Clustered index on ([BoxID])
Non-Clustered index on ([Company],[OrderNum],[Status])
Maybe 1 or 2 more on some other columns
But I'm not 100% happy with the results.
SELECT * FROM Orders WHERE ItemID=100
Gives me an index seek + a key lookup and a Nested loop (Inner join).
I can see why - but don't know if I should do anything about it. They key lookup is 97% of the batch which seems bad!
Every query used will pull back every column in the table, but I don't like the idea of including every column in the index.
I'm making a change now to query everything on the [Company] field. Every query will be using it, because results should never contain more than 1 value. So they will all change:
SELECT * FROM Orders WHERE ItemID=100 --Old
SELECT * FROM Orders WHERE Company='a' and ItemID=100 --New
But the execution plan of that gives me exactly the same as not including company (which does surprise me!).
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Is it worth adding [Company] to all my indexes since it seems to make
0 different to the execution plan?
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that
mean every query will have 2 seeks?
Is it worth 'including' all other columns in my indexes to avoid the
key lookup? (making the index a tonne bigger, but potentially
speeding it up?) i.e.
CREATE NONCLUSTERED INDEX [IX_Orders_MyIndex] ON [Orders]
( [Company] ASC, [OrderNum] ASC, [Status] ASC )
INCLUDE ([ID],[ItemID],[BoxID],
[Column5],[Column6],[Column7],[Column8],[Column9],[Column10],etc)
That seems messy if I did it on 4 or 5 indexes.
Basically I have 4-5 queries which run quite often (some selects and updates) so I want to make it as efficient as possible.
All queries will use the [company] field, and at least 1 other. How should I go about it.
Any help appreciated :)
In your execution plan, you say that lookup takes 97% of the batch.
In this case it doesn't mean anything because an index seek is very fast and you didn't have that much operation to be done.
That lookup is actually the record you read based on the index you have specified.
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Non-Clustered index on ([Company],[OrderNum],[Status])
This index will be considered only if Company, OrderNum and Status appear in your where clause.
Concatenated indexes generates a key that would look like this 0000000000000 when you pass only company it creates an incomplete key that requires using wildcard for the other to values.
It would look a little like this : key like 'XXX%' this logic will require an index scan which is time consuming.
The optimizer will determine that it's preferable to first seek and rows from the ItemID index and then scan these to match any with the required company.
Is it worth adding [Company] to all my indexes since it seems to make 0 different to the execution plan?
You should consider having a Company index instead of adding it to all your indexes.
Composite index could speed things up by reducing the number of nested loops, but you have to think then thoroughly.
The order of the fields you add to such an index is very important, they should be ordered by uniqueness to allow a better seek. Also, you should never add a field that might not be used in a query.
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that mean every query will have 2 seeks?
Having more than one index seek is not all that bad, they are usually paralleled and only the result of both are matched together.
Is it worth 'including' all other columns in my indexes to avoid the key lookup? (making the index a tonne bigger, but potentially speeding it up?)
It is worth when it's only a few fields that could be optional in the where clause or when you have queries that select only those fields when you are using the specified index.
Last notes
All indexes are not equal, comparing string (varchar) is not the same as comparing numbers (integer, datetime, bytes, etc).
Also, keeping them clean helps a lot, if your indexes are fragmented, they will be next to useless in terms of performance gain.

Getting RID Lookup instead of Table Scan?

SQL Fiddle: http://sqlfiddle.com/#!3/23cf8
In this query, when I have an In clause on an Id, and then also select other columns, the In is evaluated first, and then the Details column and other columns are pulled in via a RID Lookup:
--In production and in SQL Fiddle, Details is grabbed via a RID Lookup after the In clause is evaluated
SELECT [Id]
,[ForeignId]
,Details
--Generate a numbering(starting at 1)
--,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where foreignId In (1,2,3,5)
With this query, the Details are being pulled in via a Table Scan.
With NumberedContacts AS
(
SELECT [Id]
,[ForeignId]
--Generate a numbering(starting at 1)
,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where ForeignId In (1,2,3,5)
)
Select nc.[Id]
,nc.[ForeignId]
,sc.[Details]
From NumberedContacts nc
Inner Join SupportContacts sc on nc.Id = sc.Id
Where nc.ContactNumber <= 2 --Only grab the last 2 contacts per ForeignId
;
In SqlFiddle, the second query actually gets a RID Lookup, whereas in production with a million records it produces a Table Scan (the IN clause eliminates 99% of the rows)
Otherwise the query plan shown in SQL Fiddle is identical, the only difference being that for the second query the RID Lookup in SQL Fiddle, is a Table Scan in production :(
I would like to understand possibilities that would cause this behavior? What kinds of things would you look at to help determine the cause of it using a table scan here?
How can I influence it to use a RID Lookup there?
From looking at operation costs in the actual execution plan, I believe I can get the second query very close in performance to the first query if I can get it to use a RID Lookup. If I don't select the Detail column, then the performance of both queries is very close in production. It is only after adding other columns like Detail that performance degrades significantly for the second query. When I put it in SQL Fiddle and saw that the execution plan used an RID Lookup, I was surprised but slightly confused...
It doesn't have a clustered index because in testing with different clustered indexes, there was slightly worse performance for this and other queries. That was before I began adding other columns like Details though, and I can experiment with that more, but would like to have a understanding of what is going on now before I start shooting in the dark with random indexes.
What if you would change your main index to include the Details column?
If you use:
CREATE NONCLUSTERED INDEX [IX_SupportContacts_ForeignIdAsc_IdDesc]
ON SupportContacts ([ForeignId] ASC, [Id] DESC)
INCLUDE (Details);
then neither a RID lookup nor a table scan would be needed, since your query could be satisfied from just the index itself....
The differences in the query plans will be dependent on the types of indexes that exist and the statistics of the data for those tables in the different environments.
The optimiser uses the statistics (histograms of data frequency, mostly) and the available indexes to decide which execution plan is going to be the quickest.
So, for example, you have noticed that the performance degrades when the 'Details' column is included. This is an almost sure sign that either the 'Details' column is not part of an index, or if it is part of an index, the data in that column is mostly unique such that the index accesses would be equivalent (or almost equivalent) to a table scan.
Often when this situation arises, the optimiser will choose a table scan over the index access, as it can take advantage of things like block reads to access the table records faster than perhaps a fragmented read of an index.
To influence the path that will be chose by the optimiser, you would need to look at possible indexes that could be added/modified to make an index access more efficient, but this should be done with care as it can adversely affect other queries as well as possibly degrading insert performance.
The other important activity you can do to help the optimiser is to make sure the table statistics are kept up to date and refreshed at a frequency that is appropriate to the rate of change of the frequency distribution in the table data
If it's true that 99% of the rows would be omitted if it performed the query using the relevant index + RID then the likeliest problem in your production environment is that your statistics are out of date and the optimiser doesn't realise that ForeignID in (1,2,3,5) would limit the result set to 1% of the total data.
Here's a good link for discovering more about statistics from Pinal Dave: http://blog.sqlauthority.com/2010/01/25/sql-server-find-statistics-update-date-update-statistics/
As for forcing the optimiser to follow the correct path WITHOUT updating the statistics, you could use a table hint - if you know the index that your plan should be using which contains the ID and ForeignID columns then stick that in your query as a hint and force SQL optimiser to use the index:
http://msdn.microsoft.com/en-us/library/ms187373.aspx
FYI, if you want the best performance from your second query, use this index and avoid the headache you're experiencing altogether:
create index ix1 on SupportContacts(ForeignID, Id DESC) include (Details);