after adding index execution plan showing index is missing - sql

i am working on sql server 2008..i created index for my Transaction_tbl like this:
CREATE INDEX INloc ON transaction_tbl
(
Locid ASC
)
CREATE INDEX INDtim ON transaction_tbl
(
DTime ASC
)
CREATE INDEX INStatus ON transaction_tbl
(
Status ASC
)
CREATE INDEX INDelda ON transaction_tbl
(
DelDate ASC
)
then i checked my execution plan of sql query:but showing index is missing
my query like this:
SELECT [transactID]
,[TBarcode]
,[cmpid]
,[Locid]
,[PSID]
,[PCID]
,[PCdID]
,[PlateNo]
,[vtid]
,[Compl]
,[self]
,[LstTic]
,[Gticket]
,[Cticket]
,[Ecode]
,[dtime]
,[LICID]
,[PAICID]
,[Plot]
,[mkid]
,[mdlid]
,[Colid]
,[Comments]
,[Kticket]
,[PAmount]
,[Payid]
,[Paid]
,[Paydate]
,[POICID]
,[DelDate]
,[DelEcode]
,[PAICdate]
,[KeyRoomDate]
,[Status]
FROM [dbo].[Transaction_tbl] where locid='9'
i want to know after adding index why execution plan is showing index is missing if i give my where condition like this:
where locid= 9 and status=5 and DelDate='2014-04-05 10:10:00' and dtime='2014-04-05 10:10:00'
then not showing index is missing..how come this? if any one know please help me to clarify this..
my index missing details is like this:
/*
Missing Index Details from SQLQuery14.sql - WIN7-PC.Vallett (sa (67))
The Query Processor estimates that implementing the following index could improve the query cost by 71.5363%.
*/
/*
USE [Vallett]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Transaction_tbl] ([Locid])
INCLUDE ([transactID],[TBarcode],[cmpid],[PSID],[PCID],[PCdID],[PlateNo],[vtid],[Compl],[self],[LstTic],[Gticket],[Cticket],[Ecode],[dtime],[LICID],[PAICID],[Plot],[mkid],[mdlid],[Colid],[Comments],[Kticket],[PAmount],[Payid],[Paid],[Paydate],[POICID],[DelDate],[DelEcode],[PAICdate],[KeyRoomDate],[Status])
GO
*/

The index that SQL Server is suggesting is not the same as you already have - the one you have is on LocID alone - while the one SQL Server suggests also includes a whole bunch of extra columns so it would become a covering index, e.g. your SELECT statement could be satisfied from just the index.
With your existing index, the SELECT might use the INloc index to find a row - but to be able to return all those columns you're selecting, it will need to do rather expensive key lookups into the actual table data (most likely into the clustered index).
If you added the index suggested by SQL Server, since that index itself would contain all those columns needed, then no key lookups into the data pages would be necessary and thus the operation would be much faster.
So maybe you could check if you really need all those columns in your SELECT and if not - trim the list of columns.
However: adding this many columns to your INCLUDE list is usually not a good idea. I think the index suggestion here is not useful and I'd ignore that.
Update: also, if your SELECT always contains these four criteria
WHERE locid = 9 AND status = 5 AND DelDate='2014-04-05 10:10:00' AND dtime='2014-04-05 10:10:00'
then maybe you could try to have a single index that contains all four columns:
CREATE INDEX LocStatusDates
ON transaction_tbl(Locid, Status, DelDate, dtime)

Related

Fastest Way To Get Count From A Table With Conditions?

I am using sql server 2017 and EF Core 2.2. One of my tables right now has 5 million records in it.
I want to group all these records by "CategoryId" and then have a count for each one.
I also need to filter out with a where clause.
However even if I write the query in Sql it still takes around a minute to get me these numbers.
This is way too slow and I need something faster.
select CategoryId, count(*) from Items where Deleted = 'False'
group by CategoryId
I am guessing that EF core probably won't have a solution that will be fast enough so I am open to using ado.net if needed. I just need something that is fast.
Consider creating an indexed view to materialize the aggregation:
CREATE VIEW dbo.ItemCategory
WITH SCHEMABINDING
AS
SELECT CategoryId, COUNT_BIG(*) AS CountBig
FROM dbo.Items
WHERE Deleted = 'False'
GROUP BY CategoryId;
GO
CREATE UNIQUE CLUSTERED INDEX cdx_ItemCategory
ON dbo.ItemCategory (CategoryId);
GO
Using this view for the aggregated result will improve performance significantly:
SELECT CategoryId, CountBig
FROM dbo.ItemCategory;
Depending on your SQL Server edition, you may need to specify the NOEXPAND hint for the view index to be used:
SELECT CategoryId, CountBig
FROM dbo.ItemCategory WITH (NOEXPAND);
You better add indexes on "deleted" and categoryid.
Or put all deleted items on a separate table
You should have a covering index for your query to make it go fast, other than this there is no shortcut to get performance out of it as your query will need to read every page from the table to count the category ID.
I have a table with 5 million rows almost 4.7 million rows are set to Delete = False, without the covering index, my query takes about 12 seconds and execution plan looks like this.
Once I create the following covering index on my table the query is executed in less than a second and the execution plan looks exactly the same but it is doing a seek on the nonclustered index rather than doing a scan on the clustered index:
Index Definition:
CREATE NONCLUSTERED INDEX [Test_Index]
ON [dbo].[Test] ([IsDeleted])
INCLUDE ([CategoryId])
With this covering Index SQL Server will only need to look into the index and return the results rather than looking into your whole table.
If you really want to speed up this query then there is another very specific way to speed up this query by creating a filtered index specifically for your query;
Index definition would be:
CREATE NONCLUSTERED INDEX [Test_Index2]
ON [dbo].[Test] ([CategoryId])
WHERE IsDeleted = 'False'
With this filtered index my query was pretty instant, I didnt set IO time on my query but I would see a few milliseconds. The execution plan slightly changed with this index.

SQL index for date range query

For a few days, I've been struggling with improving the performance of my database and there are some issues that I'm still kind a confused about regarding indexing in a SQL Server database.
I'll try to be as informative as I can.
My database currently contains about 100k rows and will keep growing, therfore I'm trying to find a way to make it work faster.
I'm also writing to this table, so if you suggestion will drastically reduce the writing time please let me know.
Overall goal is to select all rows with a specific names that are in a date range.
That will usually be to select over 3,000 rows out of a lot lol ...
Table schema:
CREATE TABLE [dbo].[reports]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[IsDuplicate] [bit] NOT NULL,
[IsNotValid] [bit] NOT NULL,
[Time] [datetime] NOT NULL,
[ShortDate] [date] NOT NULL,
[Source] [nvarchar](350) NULL,
[Email] [nvarchar](350) NULL,
CONSTRAINT [PK_dbo.reports]
PRIMARY KEY CLUSTERED ([id] ASC)
) ON [PRIMARY]
This is the SQL query I'm using:
SELECT *
FROM [db].[dbo].[reports]
WHERE Source = 'name1'
AND ShortDate BETWEEN '2017-10-13' AND '2017-10-15'
As I understood, my best approach to improve efficency without hurting the writing time as much would be to create a nonclustered index on the Source and ShortDate.
Which I did like such, index schema:
CREATE NONCLUSTERED INDEX [Source&Time]
ON [dbo].[reports]([Source] ASC, [ShortDate] ASC)
Now we are getting to the tricky part which got me completely lost, the index above sometimes works, sometime half works and sometime doesn't work at all....
(not sure if it matters but currently 90% of the database rows has the same Source, although this won't stay like that for long)
With the query below, the index isn't used at all, I'm using SQL Server 2014 and in the Execution Plan it says it only uses the clustered index scan:
SELECT *
FROM [db].[dbo].[reports]
WHERE Source = 'name1'
AND ShortDate BETWEEN '2017-10-10' AND '2017-10-15'
With this query, the index isn't used at all, although I'm getting a suggestion from SQL Server to create an index with the date first and source second... I read that the index should be made by the order the query is? Also it says to include all the columns Im selecting, is that a must?... again I read that I should include in the index only the columns I'm searching.
SELECT *
FROM [db].[dbo].[reports]
WHERE Source = 'name1'
AND ShortDate = '2017-10-13'
SQL Server index suggestion -
/* The Query Processor estimates that implementing the following
index could improve the query cost by 86.2728%. */
/*
USE [db]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[reports] ([ShortDate], [Source])
INCLUDE ([id], [IsDuplicate], [IsNotValid], [Time], [Email])
GO
*/
Now I tried using the index SQL Server suggested me to make and it works, seems like it uses 100% of the nonclustered index using both the queries above.
I tried to use this index but deleting the included columns and it doesn't work... seems like I must include in the index all the columns I'm selecting?
BTW it also work when using the index I made if I include all the columns.
To summarize: seems like the order of the index didn't matter, as it worked both when creating Source + ShortDate and ShortDate + Source
But for some reason its a must to include all the columns... (which will drastically affect the writing to this table?)
Thanks a lot for reading, My goal is to understand why this stuff happens and what I should do otherwise (not just the solution as I'll need to apply it on other projects as well ).
Cheers :)
Indexing in SQL Server is part know-how from long experience (and many hours of frustration), and part black magic. Don't beat yourself up over that too much - that's what a place like SO is ideal for - lots of brains, lots of experience from many hours of optimizing, that you can tap into.
I read that the index should be made by the order the query is?
If you read this - it is absolutely NOT TRUE - the order of the columns is relevant - but in a different way: a compound index (made up from multiple columns) will only ever be considered if you specify the n left-most columns in the index definition in your query.
Classic example: a phone book with an index on (city, lastname, firstname). Such an index might be used:
in a query that specifies all three columns in its WHERE clause
in a query that uses city and lastname (find all "Miller" in "Detroit")
or in a query that only filters by city
but it can NEVER EVER be used if you want to search only for firstname ..... that's the trick about compound indexes you need to be aware of. But if you always use all columns from an index, their ordering is typically not really relevant - the query optimizer will handle this for you.
As for the included columns - those are stored only in the leaf level of the nonclustered index - they are NOT part of the search structure of the index, and you cannot specify filter values for those included columns in your WHERE clause.
The main benefit of these included columns is this: if you search in a nonclustered index, and in the end, you actually find the value you're looking for - what do you have available at that point? The nonclustered index will store the columns in the non-clustered index definition (ShortDate and Source), and it will store the clustering key (if you have one - and you should!) - but nothing else.
So in this case, once a match is found, and your query wants everything from that table, SQL Server has to do what is called a Key lookup (often also referred to as a bookmark lookup) in which it takes the clustered key and then does a Seek operation against the clustered index, to get to the actual data page that contains all the values you're looking for.
If you have included columns in your index, then the leaf level page of your non-clustered index contains
the columns as defined in the nonclustered index
the clustering key column(s)
all those additional columns as defined in your INCLUDE statement
If those columns "cover" your query, e.g. provide all the values that your query needs, then SQL Server is done once it finds the value you searched for in the nonclustered index - it can take all the values it needs from that leaf-level page of the nonclustered index, and it does NOT need to do another (expensive) key lookup into the clustering index to get the actual values.
Because of this, trying to always explicitly specify only those columns you really need in your SELECT can be beneficial - in this case, you might be able to create an efficient covering index that provides all the values for your SELECT - always using SELECT * makes that really hard or next to impossible.....
In general, you want the index to be from most selective (i.e. filtering out the most possible records) to least selective; if a column has low cardinality, the query optimizer may ignore it.
That makes intuitive sense - if you have a phone book, and you're looking for people called "smith", with the initial "A", you want to start with searching for "smith" first, and then the "A"s, rather than all people whose initial is "A" and then filter out those called "Smith". After all, the odds are that one in 26 people have the initial "A".
So, in your example, I guess you have a wide range of values in short date - so that's the first column the query optimizer is trying to filter out. You say you have few different values in "source", so the query optimizer may decide to ignore it; in that case, the second column in that index is no use either.
The order of where clauses in the index is irrelevant - you can swap them round and achieve the exact same results, so the query optimizer ignores them.
EDIT:
So, yes, make the index. Imagine you have a pile of cards to sort - in your first run, you want to remove as many cards as possible. Assuming it's all evenly spread - if you have 1000 separate short_dates over a million rows, that means you end up with 1000 items if your first run starts on short_date; if you sort by source, you have 100000 rows.
The included columns of an index is for the columns you are selecting.
Due to the fact that you do select * (which isn't good practice), the index won't be used, because it would have to lookup the whole table to get the values for the columns.
For your scenario, I would drop the default clustered index (if there is one) and create a new clustered index with the following statement:
USE [db]
GO
CREATE CLUSTERED INDEX CIX_reports
ON [dbo].[reports] ([ShortDate],[Source])
GO

How to create Index for this scenario in SQL Server?

What is the best Index to this Item table for this following query
select
tt.itemlookupcode,
tt.TotalQuantity,
tt.ExtendedPrice,
tt.ExtendedCost,
items.ExtendedDescription,
items.SubDescription1,
dept.Name,
categories.Name,
sup.Code,
sup.SupplierName
from
#temp_tt tt
left join HQMatajer.dbo.Item items
on items.ItemLookupCode=tt.itemlookupcode
left join HQMatajer.dbo.Department dept
ON dept.ID=items.DepartmentID
left join HQMatajer.dbo.Category categories
on categories.ID=items.CategoryID
left join HQMatajer.dbo.Supplier sup
ON sup.ID=items.SupplierID
drop table #temp_tt
I created Index like
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[DBTimeStamp] ASC,
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[Description],
[SubDescription1]
)
But in Execution plan when I check the index which picked another index. That index having only TimeStamp column.
What is the best index for this scenario to that particular table.
First column in index should be part of filtration else Index will not be used. In your index first column is DBTimeStamp and it is not filtered in your query. That is the reason your index is not used.
Also in covering index you have used [Description],[SubDescription1] but in query you have selected ExtendedDescription,items.SubDescription1 this will have additional overhead of key/Rid lookup
Try alerting your index like this
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[ExtendedDescription],
[SubDescription1]
)
Having said that all still optimizer go for scan or choose some other index based on data retrieved from Item table
I'm not surprised your index isn't used. DBTimeStamp is likely to be highly selective, and is not referenced in your query at all.
You might have forgotten to include an ORDER BY clause in your query which was intended reference DBTimeStamp. But even then your query would probably need to scan the entire index. So it may as well scan the actual table.
The only way to make that index 'look enticing' would be to ensure it includes all columns that are used/returned. I.e. You'd need to add ExtendedDescription. The reason this can help is that indexes typically require less storage than the full table. So it's faster to read from disk. But if you're missing columns (in your case ExtendedDescription), then the engine needs to perform an additional lookup onto the full table in any case.
I can't comment why the DBTimeStamp column is preferred - you haven't given enough detail. But perhaps it's the CLUSTERED index?
Your index would be almost certain to be used if defined as:
(
[ItemLookupCode] ASC --The only thing you're actually filtering by
)
INCLUDE (
/* Moving the rest to include is most efficient for the index tree.
And by including ALL used columns, there's no need to perform
extra lookups to the full table.
*/
[DepartmentID],
[CategoryID],
[SupplierID],
[ExtendedDescription],
[SubDescription1]
)
Note however, that this kind of indexing strategy 'Find the best for each query used' is unsustainable.
You're better off finding 'narrower' indexes that are appropriate multiple queries.
Every index slows down INSERT and UPDATE queries.
And indexes like this are impacted by more columns than the preferred 'narrower' indexes.
Index choice should focus on the selectivity of columns. I.e. Given a specific value or small range of values, what percentage of data is likely to be selected based on your queries?
In your case, I'd expect ItemLookupCode to be unique per item in the Items table. In other words indexing by that without any includes should be sufficient. However, since you're joining to a temp table that theoretically could include all item codes: in some cases it might be better to scan the CLUSTERED INDEX in any case.

If I have a single nonclustered index on a table, will the number of columns I include change the slow down when writing to it?

On the exact same table, if I was to put one index on it, either:
CREATE INDEX ix_single ON MyTable (uid asc) include (columnone)
or:
CREATE INDEX ix_multi ON MyTable (uid asc) include (
columnone,
columntwo,
columnthree,
....
columnX
)
Would the second index cause an even greater lag on how long it takes to write to the table than the first one? And why?
Included columns will need more diskspace as well as time on data manipulation...
If there is a clustered index on this table too (ideally on a implicitly sorted column like an IDENTITY column to avoid fragmentation) this will serve as fast lookup on all columns (but you must create the clustered index before the other one...)
To include columns into an index is a usefull approach in extremely performance related issues only...

How should I create this index?

I have queries that look like:
select blah, foo
from tableA
where makedate = #somedate
select bar, baz
from tableA
where vendorid = #someid
select foobar, onetwo
from tableA
where vendorid = #someid and makedate between #date1 and #date2
Should I create just one index:
create nonclustered index searches_index on tableA(vendorid, makedate)
Should I create 3 indexes:
create nonclustered index searches_index on tableA(vendorid,
makedate)
create nonclustered index searches_index on tableA(vendorid)
create nonclustered index searches_index on tableA(makedate)
Also are these two different indexes? In other words, does column order matter?
create nonclustered index searches_index on tableA(vendorid, makedate)
create nonclustered index searches_index on tableA(makedate, vendorid)
I've been reading up on indexes but not sure on the best way to make them?
Neither of your suggestions is optimal.
You should create two indexes:
create nonclustered index searches_index on tableA(vendorid, makedate)
create nonclustered index searches_index on tableA(makedate)
The reasons is that the first index on (vendorid, makedate) will be used for both the second and third of your sample queries; an index on (vendorid) only would be redundant.
[Edit] To answer your additional question:
Yes, column order does matter in index creation. An index on (vendorid, makedate) can be used to optimize queries of the form WHERE vendorid = ? AND makedate = ? or WHERE vendorid = ? but cannot help with the query WHERE makedate = ?. In order to get any significant index optimization on the last query you would need an index with makedate at the head of the index. (Note that in my example queries "=" means any optimizable condition).
There exist some edge cases in which an otherwise unhelpful index (like (vendorid, makedate) in a query against makedate only) can provide some nominal help in returning data as #Bram points out in the comments. For instance, if you return only the columns makedate and vendorid in that query then the SQL engine can treat the index as a mini-table and sequentially scan that to find the matching rows, never having to look at the full copy of the table. This is called a covering index.
If it were me, and I knew that the table was always going to be queried in one of those three ways you listed, I would create three indexes as you suggested. Those are pretty light-weight indexes to have, so I wouldn't be concerned (generally speaking) about the other costs that multiple indexes will incur.
Just my opinion, I'm sure there are others contrary.
Also side note: when creating 3 indexes, they will all need to have a unique name