How should I create this index? - sql

I have queries that look like:
select blah, foo
from tableA
where makedate = #somedate
select bar, baz
from tableA
where vendorid = #someid
select foobar, onetwo
from tableA
where vendorid = #someid and makedate between #date1 and #date2
Should I create just one index:
create nonclustered index searches_index on tableA(vendorid, makedate)
Should I create 3 indexes:
create nonclustered index searches_index on tableA(vendorid,
makedate)
create nonclustered index searches_index on tableA(vendorid)
create nonclustered index searches_index on tableA(makedate)
Also are these two different indexes? In other words, does column order matter?
create nonclustered index searches_index on tableA(vendorid, makedate)
create nonclustered index searches_index on tableA(makedate, vendorid)
I've been reading up on indexes but not sure on the best way to make them?

Neither of your suggestions is optimal.
You should create two indexes:
create nonclustered index searches_index on tableA(vendorid, makedate)
create nonclustered index searches_index on tableA(makedate)
The reasons is that the first index on (vendorid, makedate) will be used for both the second and third of your sample queries; an index on (vendorid) only would be redundant.
[Edit] To answer your additional question:
Yes, column order does matter in index creation. An index on (vendorid, makedate) can be used to optimize queries of the form WHERE vendorid = ? AND makedate = ? or WHERE vendorid = ? but cannot help with the query WHERE makedate = ?. In order to get any significant index optimization on the last query you would need an index with makedate at the head of the index. (Note that in my example queries "=" means any optimizable condition).
There exist some edge cases in which an otherwise unhelpful index (like (vendorid, makedate) in a query against makedate only) can provide some nominal help in returning data as #Bram points out in the comments. For instance, if you return only the columns makedate and vendorid in that query then the SQL engine can treat the index as a mini-table and sequentially scan that to find the matching rows, never having to look at the full copy of the table. This is called a covering index.

If it were me, and I knew that the table was always going to be queried in one of those three ways you listed, I would create three indexes as you suggested. Those are pretty light-weight indexes to have, so I wouldn't be concerned (generally speaking) about the other costs that multiple indexes will incur.
Just my opinion, I'm sure there are others contrary.
Also side note: when creating 3 indexes, they will all need to have a unique name

Related

Adding specific index to SQL Server table to improve performance

I have a slow query on a table.
SELECT (some columns)
FROM Table
This table has an ID (integer, identity (1,1)) primary index which is the only index on this table.
The query has a WHERE clause:
WHERE Field05 <> 1
AND (Field01 LIKE '%something%' OR Field02 LIKE '%something%' OR
Field03 LIKE'%something%' OR Field04 LIKE'%something%')
Field05 is bit, not null
Field01 is NVarchar(255)
Field02 is NVarchar(255)
Field03 is Nchar(11)
Field04 is Varchar(50)
The execution plan shows a "Clustered index scan" resulting in a slow execution.
I tried adding indexes:
CREATE NONCLUSTERED INDEX IX_Aziende_RagSoc ON dbo.Aziende (Field01);
CREATE NONCLUSTERED INDEX IX_Aziende_Nome ON dbo.Aziende (Field02);
CREATE NONCLUSTERED INDEX IX_Aziende_PIVA ON dbo.Aziende (Field03);
CREATE NONCLUSTERED INDEX IX_Aziende_CodFisc ON dbo.Aziende (Field04);
CREATE NONCLUSTERED INDEX IX_Aziende_Eliminata ON dbo.Aziende (Field05);
Same performances, and again, the execution plan shows a "Clustered index scan"
I removed these 5 indexes and added only ONE index:
CREATE NONCLUSTERED INDEX IX_Aziende_Ricerca
ON Aziende (Field05)
INCLUDE (Field01, Field02, Field03, Field04)
Same performances, but in this situation the execution plan changes.
Is more complex but always slow.
I removed this index and added a different index:
CREATE NONCLUSTERED INDEX IX_Aziende_Ricerca
ON Aziende (Field05,Field01,Field02,Field03,Field04)
Same performances, in this situation the execution plan remains like in the previous situation.
The execution is always slow.
I have no other ideas ... someone can help?
This is too long for a comment.
First, you should use Field05 = 0 rather than Field05 <> 1. Equality is both easier to read and better for the optimizer. It won't make a difference in this particular case, unless you have a clustered index starting with Field05 or if almost all values are 1 (that is, the 0 is highly selective).
Second, in general, you can only optimize string pattern matching using a full text index. This in turn has other limitations, such as looking for words or prefixes (but not starting with wildcards).
The one exception is if "something" is a constant. In that case, you could add persisted computed columns with indexes to capture whether the value is present in these columns. However, I'm guessing that "something" is not constant.
That leaves you with full text indexes or with reconsidering your data model. Perhaps you are storing things in strings -- like lists of tags -- that should really be in a separate table.
Just to chime in with a few comments.
SQL Server tends to Table Scan Even if an index is present unless it thinks the Searched field Has a Cardinality of less than 1%. With this in mind there is never going to be any value in a index on a Bit field. (cardinality 50%!)
One option you might consider is to create a Filtered Index (WHERE Field05 = 0) Then you can include your other fields in this index.
Note this will only help you if you are not selecting any other columns from the table.
Can you check what proportion of your data has Field5=0 ?- If this is small (eg under 10%) then a filtered index might help.
I can't see any way that you can avoid a scan of some sort though - The best you can get is probably an Index scan.
Another option (essentially the same thing!) is to create a schema bound indexed view with all the columns you need and with the field5=0 filter hardcoded into the view.
Again - Unless you are certain that the Selected Column list is going to be a tiny proportion of the columns in the table then SQL will probably be faster with a table scan. If you were only ever selecting a handful of columns from a a very wide table then an index covering these columns might help as even though it will still be a scan - there will be more rows per page than scanning the full table.
So in summary - If you can guarantee a small subset of the table cols will be selected
AND field5 = 0 represents a minority of your rows in the table then a filtered index with Includes can be of value.
EG
CREATE NONCLUSTERED INDEX ix ON dbo.Aziende(ID) INCLUDE (Field01,Field02,Field03,Field04, [other cols used by select]) WHERE (field5=0)
Good Luck!
After a lot of fight I forgot the idea of adding an index.
Nothing changes with index.
I changed the C# code that builds the query, and now I try to understand the meaning of the "something" parameter received from the function.
If it is of type 1, then I build a WHERE on Field01
If it is of type 2, then I build a WHERE on Field02
If it is of type 3, then I build a WHERE on Field03
If it is of type 4, then I build a WHERE on Field04
This way, execution times becomes 1/4 of before.
Curstomers are satisfied.

How to create Index for this scenario in SQL Server?

What is the best Index to this Item table for this following query
select
tt.itemlookupcode,
tt.TotalQuantity,
tt.ExtendedPrice,
tt.ExtendedCost,
items.ExtendedDescription,
items.SubDescription1,
dept.Name,
categories.Name,
sup.Code,
sup.SupplierName
from
#temp_tt tt
left join HQMatajer.dbo.Item items
on items.ItemLookupCode=tt.itemlookupcode
left join HQMatajer.dbo.Department dept
ON dept.ID=items.DepartmentID
left join HQMatajer.dbo.Category categories
on categories.ID=items.CategoryID
left join HQMatajer.dbo.Supplier sup
ON sup.ID=items.SupplierID
drop table #temp_tt
I created Index like
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[DBTimeStamp] ASC,
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[Description],
[SubDescription1]
)
But in Execution plan when I check the index which picked another index. That index having only TimeStamp column.
What is the best index for this scenario to that particular table.
First column in index should be part of filtration else Index will not be used. In your index first column is DBTimeStamp and it is not filtered in your query. That is the reason your index is not used.
Also in covering index you have used [Description],[SubDescription1] but in query you have selected ExtendedDescription,items.SubDescription1 this will have additional overhead of key/Rid lookup
Try alerting your index like this
CREATE NONCLUSTERED INDEX [JFC_ItemLookupCode_DepartmentID_CategoryID_SupplierID_INC_Description_SubDescriptions] ON [dbo].[Item]
(
[ItemLookupCode] ASC,
[DepartmentID] ASC,
[CategoryID] ASC,
[SupplierID] ASC
)
INCLUDE (
[ExtendedDescription],
[SubDescription1]
)
Having said that all still optimizer go for scan or choose some other index based on data retrieved from Item table
I'm not surprised your index isn't used. DBTimeStamp is likely to be highly selective, and is not referenced in your query at all.
You might have forgotten to include an ORDER BY clause in your query which was intended reference DBTimeStamp. But even then your query would probably need to scan the entire index. So it may as well scan the actual table.
The only way to make that index 'look enticing' would be to ensure it includes all columns that are used/returned. I.e. You'd need to add ExtendedDescription. The reason this can help is that indexes typically require less storage than the full table. So it's faster to read from disk. But if you're missing columns (in your case ExtendedDescription), then the engine needs to perform an additional lookup onto the full table in any case.
I can't comment why the DBTimeStamp column is preferred - you haven't given enough detail. But perhaps it's the CLUSTERED index?
Your index would be almost certain to be used if defined as:
(
[ItemLookupCode] ASC --The only thing you're actually filtering by
)
INCLUDE (
/* Moving the rest to include is most efficient for the index tree.
And by including ALL used columns, there's no need to perform
extra lookups to the full table.
*/
[DepartmentID],
[CategoryID],
[SupplierID],
[ExtendedDescription],
[SubDescription1]
)
Note however, that this kind of indexing strategy 'Find the best for each query used' is unsustainable.
You're better off finding 'narrower' indexes that are appropriate multiple queries.
Every index slows down INSERT and UPDATE queries.
And indexes like this are impacted by more columns than the preferred 'narrower' indexes.
Index choice should focus on the selectivity of columns. I.e. Given a specific value or small range of values, what percentage of data is likely to be selected based on your queries?
In your case, I'd expect ItemLookupCode to be unique per item in the Items table. In other words indexing by that without any includes should be sufficient. However, since you're joining to a temp table that theoretically could include all item codes: in some cases it might be better to scan the CLUSTERED INDEX in any case.

If I have a single nonclustered index on a table, will the number of columns I include change the slow down when writing to it?

On the exact same table, if I was to put one index on it, either:
CREATE INDEX ix_single ON MyTable (uid asc) include (columnone)
or:
CREATE INDEX ix_multi ON MyTable (uid asc) include (
columnone,
columntwo,
columnthree,
....
columnX
)
Would the second index cause an even greater lag on how long it takes to write to the table than the first one? And why?
Included columns will need more diskspace as well as time on data manipulation...
If there is a clustered index on this table too (ideally on a implicitly sorted column like an IDENTITY column to avoid fragmentation) this will serve as fast lookup on all columns (but you must create the clustered index before the other one...)
To include columns into an index is a usefull approach in extremely performance related issues only...

after adding index execution plan showing index is missing

i am working on sql server 2008..i created index for my Transaction_tbl like this:
CREATE INDEX INloc ON transaction_tbl
(
Locid ASC
)
CREATE INDEX INDtim ON transaction_tbl
(
DTime ASC
)
CREATE INDEX INStatus ON transaction_tbl
(
Status ASC
)
CREATE INDEX INDelda ON transaction_tbl
(
DelDate ASC
)
then i checked my execution plan of sql query:but showing index is missing
my query like this:
SELECT [transactID]
,[TBarcode]
,[cmpid]
,[Locid]
,[PSID]
,[PCID]
,[PCdID]
,[PlateNo]
,[vtid]
,[Compl]
,[self]
,[LstTic]
,[Gticket]
,[Cticket]
,[Ecode]
,[dtime]
,[LICID]
,[PAICID]
,[Plot]
,[mkid]
,[mdlid]
,[Colid]
,[Comments]
,[Kticket]
,[PAmount]
,[Payid]
,[Paid]
,[Paydate]
,[POICID]
,[DelDate]
,[DelEcode]
,[PAICdate]
,[KeyRoomDate]
,[Status]
FROM [dbo].[Transaction_tbl] where locid='9'
i want to know after adding index why execution plan is showing index is missing if i give my where condition like this:
where locid= 9 and status=5 and DelDate='2014-04-05 10:10:00' and dtime='2014-04-05 10:10:00'
then not showing index is missing..how come this? if any one know please help me to clarify this..
my index missing details is like this:
/*
Missing Index Details from SQLQuery14.sql - WIN7-PC.Vallett (sa (67))
The Query Processor estimates that implementing the following index could improve the query cost by 71.5363%.
*/
/*
USE [Vallett]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Transaction_tbl] ([Locid])
INCLUDE ([transactID],[TBarcode],[cmpid],[PSID],[PCID],[PCdID],[PlateNo],[vtid],[Compl],[self],[LstTic],[Gticket],[Cticket],[Ecode],[dtime],[LICID],[PAICID],[Plot],[mkid],[mdlid],[Colid],[Comments],[Kticket],[PAmount],[Payid],[Paid],[Paydate],[POICID],[DelDate],[DelEcode],[PAICdate],[KeyRoomDate],[Status])
GO
*/
The index that SQL Server is suggesting is not the same as you already have - the one you have is on LocID alone - while the one SQL Server suggests also includes a whole bunch of extra columns so it would become a covering index, e.g. your SELECT statement could be satisfied from just the index.
With your existing index, the SELECT might use the INloc index to find a row - but to be able to return all those columns you're selecting, it will need to do rather expensive key lookups into the actual table data (most likely into the clustered index).
If you added the index suggested by SQL Server, since that index itself would contain all those columns needed, then no key lookups into the data pages would be necessary and thus the operation would be much faster.
So maybe you could check if you really need all those columns in your SELECT and if not - trim the list of columns.
However: adding this many columns to your INCLUDE list is usually not a good idea. I think the index suggestion here is not useful and I'd ignore that.
Update: also, if your SELECT always contains these four criteria
WHERE locid = 9 AND status = 5 AND DelDate='2014-04-05 10:10:00' AND dtime='2014-04-05 10:10:00'
then maybe you could try to have a single index that contains all four columns:
CREATE INDEX LocStatusDates
ON transaction_tbl(Locid, Status, DelDate, dtime)

How to optimise slow SQL query

I need a help to optimise this query. In stored procedure this part is executed for 1 hour (all procedure need 2 to execute). Procedure works for a large amount of data. Query works with two temporary tables. Both use indexes:
create unique clustered index #cx_tDuguje on #tDuguje (Partija, Referenca, Konto, Valuta, DatumValute)
create nonclustered index #cx_tDuguje_1 on #tDuguje (Partija, Valuta, Referenca, Konto, sIznos)
create unique clustered index #cx_tPotrazuje on #tPotrazuje (Partija, Referenca, Konto, Valuta, DatumValute)
create nonclustered index #cx_tPotrazuje_1 on #tPotrazuje (Partija, Valuta, Referenca, Konto, pIznos)
And this is a query:
select D.Partija,
D.Referenca,
D.Konto,
D.Valuta,
D.DatumValute DatumZad,
NULLIF(MAX(COALESCE(P.DatumValute,#NextDay)), ,#NextDay) DatumUpl,
MAX(D.DospObaveze) DospObaveze,
MAX(D.LimitMatZn) LimitMatZn
into #dwkKasnjenja_WNT
from #tDuguje D
left join #tPotrazuje P on D.Partija = P.Partija
AND D.Valuta = p.Valuta
AND D.Referenca = p.Referenca
AND D.Konto = P.Konto
and P.pIznos < D.sIznos and D.sIznos <= P.Iznos
WHERE 1=1
AND D.DatumValute IS NOT NULL
GROUP BY D.Partija, D.Referenca, D.Konto, D.Valuta, D.DatumValute
I have and Execution plan, but i am not enabled to post it here.
Just an idea: If you are permitted to do so, try to change the business logic first.
Maybe you constrain the result set only to include data from, say, a point in time back that is meaningful.
How far back in time do your account data reach? Do you really need to include all data all the way back from good old 1999?
Maybe you can say
D.DatumValute >= "Jan 1 2010"
or similar, in your WHERE clause
And this might create a much smaller temporary result set that is used in your complicated JOIN clause which will then run faster.
If you can't do this, maybe do a
"select top 1000 ... order by datum desc" query, which might run faster, and then if the user really needs to , perfomr the slow running query in a second step.
Difficult to say without haging an execution plan or some hints about number of rows in each table.
Replace this index
create nonclustered index #cx_tPotrazuje_1 on #tPotrazuje (Partija, Valuta, Referenca, Konto, pIznos)
by
create nonclustered index #cx_tPotrazuje_1 on #tPotrazuje (Partija, Valuta, Referenca, Konto, sIznos, pIznos)
The creation of indexes is a very expensive process and it could be slow sometimes, according also to the workload of the instance and to the columns involved, for which an index is created.
Furthermore, it's very difficult to say what you need to optimise, without an execution plan and without know something about the data types of the columns involved in the indexes creation.
For example, the data types of the columns Partija, Referenca, Konto, Valuta, DatumValute are not so clear.
You should tell us the data types of the columns involved in the creation of your indexes.