Adding specific index to SQL Server table to improve performance

Adding specific index to SQL Server table to improve performance - sql

I have a slow query on a table.
SELECT (some columns)
FROM Table
This table has an ID (integer, identity (1,1)) primary index which is the only index on this table.
The query has a WHERE clause:
WHERE Field05 <> 1
AND (Field01 LIKE '%something%' OR Field02 LIKE '%something%' OR
Field03 LIKE'%something%' OR Field04 LIKE'%something%')
Field05 is bit, not null
Field01 is NVarchar(255)
Field02 is NVarchar(255)
Field03 is Nchar(11)
Field04 is Varchar(50)
The execution plan shows a "Clustered index scan" resulting in a slow execution.
I tried adding indexes:
CREATE NONCLUSTERED INDEX IX_Aziende_RagSoc ON dbo.Aziende (Field01);
CREATE NONCLUSTERED INDEX IX_Aziende_Nome ON dbo.Aziende (Field02);
CREATE NONCLUSTERED INDEX IX_Aziende_PIVA ON dbo.Aziende (Field03);
CREATE NONCLUSTERED INDEX IX_Aziende_CodFisc ON dbo.Aziende (Field04);
CREATE NONCLUSTERED INDEX IX_Aziende_Eliminata ON dbo.Aziende (Field05);
Same performances, and again, the execution plan shows a "Clustered index scan"
I removed these 5 indexes and added only ONE index:
CREATE NONCLUSTERED INDEX IX_Aziende_Ricerca
ON Aziende (Field05)
INCLUDE (Field01, Field02, Field03, Field04)
Same performances, but in this situation the execution plan changes.
Is more complex but always slow.
I removed this index and added a different index:
CREATE NONCLUSTERED INDEX IX_Aziende_Ricerca
ON Aziende (Field05,Field01,Field02,Field03,Field04)
Same performances, in this situation the execution plan remains like in the previous situation.
The execution is always slow.
I have no other ideas ... someone can help?

This is too long for a comment.
First, you should use Field05 = 0 rather than Field05 <> 1. Equality is both easier to read and better for the optimizer. It won't make a difference in this particular case, unless you have a clustered index starting with Field05 or if almost all values are 1 (that is, the 0 is highly selective).
Second, in general, you can only optimize string pattern matching using a full text index. This in turn has other limitations, such as looking for words or prefixes (but not starting with wildcards).
The one exception is if "something" is a constant. In that case, you could add persisted computed columns with indexes to capture whether the value is present in these columns. However, I'm guessing that "something" is not constant.
That leaves you with full text indexes or with reconsidering your data model. Perhaps you are storing things in strings -- like lists of tags -- that should really be in a separate table.

Just to chime in with a few comments.
SQL Server tends to Table Scan Even if an index is present unless it thinks the Searched field Has a Cardinality of less than 1%. With this in mind there is never going to be any value in a index on a Bit field. (cardinality 50%!)
One option you might consider is to create a Filtered Index (WHERE Field05 = 0) Then you can include your other fields in this index.
Note this will only help you if you are not selecting any other columns from the table.
Can you check what proportion of your data has Field5=0 ?- If this is small (eg under 10%) then a filtered index might help.
I can't see any way that you can avoid a scan of some sort though - The best you can get is probably an Index scan.
Another option (essentially the same thing!) is to create a schema bound indexed view with all the columns you need and with the field5=0 filter hardcoded into the view.
Again - Unless you are certain that the Selected Column list is going to be a tiny proportion of the columns in the table then SQL will probably be faster with a table scan. If you were only ever selecting a handful of columns from a a very wide table then an index covering these columns might help as even though it will still be a scan - there will be more rows per page than scanning the full table.
So in summary - If you can guarantee a small subset of the table cols will be selected
AND field5 = 0 represents a minority of your rows in the table then a filtered index with Includes can be of value.
EG
CREATE NONCLUSTERED INDEX ix ON dbo.Aziende(ID) INCLUDE (Field01,Field02,Field03,Field04, [other cols used by select]) WHERE (field5=0)
Good Luck!

After a lot of fight I forgot the idea of adding an index.
Nothing changes with index.
I changed the C# code that builds the query, and now I try to understand the meaning of the "something" parameter received from the function.
If it is of type 1, then I build a WHERE on Field01
If it is of type 2, then I build a WHERE on Field02
If it is of type 3, then I build a WHERE on Field03
If it is of type 4, then I build a WHERE on Field04
This way, execution times becomes 1/4 of before.
Curstomers are satisfied.

Related

Clustered Index Vs Non-Clustered Index Usage

I am new to query optimization in T-SQL and I am a bit confused with one of the implementations.
The scenario has been explained here: I have this table (Table A) on which regular inserts are happening, no updates - only inserts as data is being moved to another table (Table B) based on a filter on a particular column in Table A (Col-1).
Two columns in Table A which I am focusing on are Col-1 (identity column) and Col-2 (nvarchar(20) -- and has duplicates).
Col-2 is on which I am filtering my records when moving my data from Table A to Table B.
Should I be defining a clustered index on Col-1 and a nonclustered index on Col-2, since I am filtering on Col-2; or should I only define a nonclustered index on Col-2 to speed up query performance?
Or should I keep the table as Heap and only define nonclustered index on the Col-2.
Moreover, would defining a clustered index and storing the table as a B-Tree degrade performance as we are appending data into Table -A weekly through inserts.
Thanks for the help.

As many here have said, it's hard to say definitively what will be the best solution without testing. However, you say that you are filtering by col2 before choosing to move data. Depending on what percentage of those records are moved, I would suggest starting with clustering on the unique col1. Then create a non-clustered index on col2. One advantage of the non-clustered index is that you can make it a filtered index with a WHERE clause. So, for example, if only 10% of your records have a col2 value from a few choices that you care about, the index 'WHERE col2 IN (val, val2, val3) will be 10x smaller and therefore faster to access.
If you go this route, make sure the WHERE clause in your SELECT matches the WHERE clause you specify on the index.

SQL index for date range query

For a few days, I've been struggling with improving the performance of my database and there are some issues that I'm still kind a confused about regarding indexing in a SQL Server database.
I'll try to be as informative as I can.
My database currently contains about 100k rows and will keep growing, therfore I'm trying to find a way to make it work faster.
I'm also writing to this table, so if you suggestion will drastically reduce the writing time please let me know.
Overall goal is to select all rows with a specific names that are in a date range.
That will usually be to select over 3,000 rows out of a lot lol ...
Table schema:
CREATE TABLE [dbo].[reports]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[IsDuplicate] [bit] NOT NULL,
[IsNotValid] [bit] NOT NULL,
[Time] [datetime] NOT NULL,
[ShortDate] [date] NOT NULL,
[Source] [nvarchar](350) NULL,
[Email] [nvarchar](350) NULL,
CONSTRAINT [PK_dbo.reports]
PRIMARY KEY CLUSTERED ([id] ASC)
) ON [PRIMARY]
This is the SQL query I'm using:
SELECT *
FROM [db].[dbo].[reports]
WHERE Source = 'name1'
AND ShortDate BETWEEN '2017-10-13' AND '2017-10-15'
As I understood, my best approach to improve efficency without hurting the writing time as much would be to create a nonclustered index on the Source and ShortDate.
Which I did like such, index schema:
CREATE NONCLUSTERED INDEX [Source&Time]
ON [dbo].[reports]([Source] ASC, [ShortDate] ASC)
Now we are getting to the tricky part which got me completely lost, the index above sometimes works, sometime half works and sometime doesn't work at all....
(not sure if it matters but currently 90% of the database rows has the same Source, although this won't stay like that for long)
With the query below, the index isn't used at all, I'm using SQL Server 2014 and in the Execution Plan it says it only uses the clustered index scan:
SELECT *
FROM [db].[dbo].[reports]
WHERE Source = 'name1'
AND ShortDate BETWEEN '2017-10-10' AND '2017-10-15'
With this query, the index isn't used at all, although I'm getting a suggestion from SQL Server to create an index with the date first and source second... I read that the index should be made by the order the query is? Also it says to include all the columns Im selecting, is that a must?... again I read that I should include in the index only the columns I'm searching.
SELECT *
FROM [db].[dbo].[reports]
WHERE Source = 'name1'
AND ShortDate = '2017-10-13'
SQL Server index suggestion -
/* The Query Processor estimates that implementing the following
index could improve the query cost by 86.2728%. */
/*
USE [db]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[reports] ([ShortDate], [Source])
INCLUDE ([id], [IsDuplicate], [IsNotValid], [Time], [Email])
GO
*/
Now I tried using the index SQL Server suggested me to make and it works, seems like it uses 100% of the nonclustered index using both the queries above.
I tried to use this index but deleting the included columns and it doesn't work... seems like I must include in the index all the columns I'm selecting?
BTW it also work when using the index I made if I include all the columns.
To summarize: seems like the order of the index didn't matter, as it worked both when creating Source + ShortDate and ShortDate + Source
But for some reason its a must to include all the columns... (which will drastically affect the writing to this table?)
Thanks a lot for reading, My goal is to understand why this stuff happens and what I should do otherwise (not just the solution as I'll need to apply it on other projects as well ).
Cheers :)

Indexing in SQL Server is part know-how from long experience (and many hours of frustration), and part black magic. Don't beat yourself up over that too much - that's what a place like SO is ideal for - lots of brains, lots of experience from many hours of optimizing, that you can tap into.
I read that the index should be made by the order the query is?
If you read this - it is absolutely NOT TRUE - the order of the columns is relevant - but in a different way: a compound index (made up from multiple columns) will only ever be considered if you specify the n left-most columns in the index definition in your query.
Classic example: a phone book with an index on (city, lastname, firstname). Such an index might be used:
in a query that specifies all three columns in its WHERE clause
in a query that uses city and lastname (find all "Miller" in "Detroit")
or in a query that only filters by city
but it can NEVER EVER be used if you want to search only for firstname ..... that's the trick about compound indexes you need to be aware of. But if you always use all columns from an index, their ordering is typically not really relevant - the query optimizer will handle this for you.
As for the included columns - those are stored only in the leaf level of the nonclustered index - they are NOT part of the search structure of the index, and you cannot specify filter values for those included columns in your WHERE clause.
The main benefit of these included columns is this: if you search in a nonclustered index, and in the end, you actually find the value you're looking for - what do you have available at that point? The nonclustered index will store the columns in the non-clustered index definition (ShortDate and Source), and it will store the clustering key (if you have one - and you should!) - but nothing else.
So in this case, once a match is found, and your query wants everything from that table, SQL Server has to do what is called a Key lookup (often also referred to as a bookmark lookup) in which it takes the clustered key and then does a Seek operation against the clustered index, to get to the actual data page that contains all the values you're looking for.
If you have included columns in your index, then the leaf level page of your non-clustered index contains
the columns as defined in the nonclustered index
the clustering key column(s)
all those additional columns as defined in your INCLUDE statement
If those columns "cover" your query, e.g. provide all the values that your query needs, then SQL Server is done once it finds the value you searched for in the nonclustered index - it can take all the values it needs from that leaf-level page of the nonclustered index, and it does NOT need to do another (expensive) key lookup into the clustering index to get the actual values.
Because of this, trying to always explicitly specify only those columns you really need in your SELECT can be beneficial - in this case, you might be able to create an efficient covering index that provides all the values for your SELECT - always using SELECT * makes that really hard or next to impossible.....

In general, you want the index to be from most selective (i.e. filtering out the most possible records) to least selective; if a column has low cardinality, the query optimizer may ignore it.
That makes intuitive sense - if you have a phone book, and you're looking for people called "smith", with the initial "A", you want to start with searching for "smith" first, and then the "A"s, rather than all people whose initial is "A" and then filter out those called "Smith". After all, the odds are that one in 26 people have the initial "A".
So, in your example, I guess you have a wide range of values in short date - so that's the first column the query optimizer is trying to filter out. You say you have few different values in "source", so the query optimizer may decide to ignore it; in that case, the second column in that index is no use either.
The order of where clauses in the index is irrelevant - you can swap them round and achieve the exact same results, so the query optimizer ignores them.
EDIT:
So, yes, make the index. Imagine you have a pile of cards to sort - in your first run, you want to remove as many cards as possible. Assuming it's all evenly spread - if you have 1000 separate short_dates over a million rows, that means you end up with 1000 items if your first run starts on short_date; if you sort by source, you have 100000 rows.

The included columns of an index is for the columns you are selecting.
Due to the fact that you do select * (which isn't good practice), the index won't be used, because it would have to lookup the whole table to get the values for the columns.
For your scenario, I would drop the default clustered index (if there is one) and create a new clustered index with the following statement:
USE [db]
GO
CREATE CLUSTERED INDEX CIX_reports
ON [dbo].[reports] ([ShortDate],[Source])
GO

What "Clustered Index Scan (Clustered)" means on SQL Server execution plan?

I have a query that fails to execute with "Could not allocate a new page for database 'TEMPDB' because of insufficient disk space in filegroup 'DEFAULT'".
On the way of trouble shooting I am examining the execution plan. There are two costly steps labeled "Clustered Index Scan (Clustered)". I have a hard time find out what this means?
I would appreciate any explanations to "Clustered Index Scan (Clustered)" or suggestions on where to find the related document?

I would appreciate any explanations to "Clustered Index Scan
(Clustered)"
I will try to put in the easiest manner, for better understanding you need to understand both index seek and scan.
SO lets build the table
use tempdb GO
create table scanseek (id int , name varchar(50) default ('some random names') )
create clustered index IX_ID_scanseek on scanseek(ID)
declare #i int
SET #i = 0
while (#i <5000)
begin
insert into scanseek
select #i, 'Name' + convert( varchar(5) ,#i)
set #i =#i+1
END
An index seek is where SQL server uses the b-tree structure of the index to seek directly to matching records
you can check your table root and leaf nodes using the DMV below
-- check index level
SELECT
index_level
,record_count
,page_count
,avg_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID('tempdb'),OBJECT_ID('scanseek'),NULL,NULL,'DETAILED')
GO
Now here we have clustered index on column "ID"
lets look for some direct matching records
select * from scanseek where id =340
and look at the Execution plan
you've requested rows directly in the query that's why you got a clustered index SEEK .
Clustered index scan: When Sql server reads through for the Row(s) from top to bottom in the clustered index.
for example searching data in non key column. In our table NAME is non key column so if we will search some data in the name column we will see clustered index scan because all the rows are in clustered index leaf level.
Example
select * from scanseek where name = 'Name340'
please note: I made this answer short for better understanding only, if you have any question or suggestion please comment below.

Expanding on Gordon's answer in the comments, a clustered index scan is scanning one of the tables indexes to find the values you are doing a where clause filter, or for a join to the next table in your query plan.
Tables can have multiple indexes (one clustered and many non-clustered) and SQL Server will search the appropriate one based upon the filter or join being executed.
Clustered Indexes are explained pretty well on MSDN. The key difference between clustered and non-clustered is that the clustered index defines how rows are stored on disk.
If your clustered index is very expensive to search due to the number of records, you may want to add a non-clustered index on the table for fields that you search for often, such as date fields used for filtering ranges of records.

A clustered index is one in which the terminal (leaf) node of the index is the actual data page itself. There can be only one clustered index per table, because it specifies how records are arranged within the data page. It is generally (and with some exceptions) considered the most performant index type (primarily because there is one less level of indirection before you get to your actual data record).
A "clustered index scan" means that the SQL engine is traversing your clustered index in search for a particular value (or set of values). It is one of the most efficient methods for locating a record (beat by a "clustered index seek" in which the SQL Engine is looking to match a single selected value).
The error message has absolutely nothing to do with the query plan. It just means that you are out of space on TempDB.

I have been having issues with performance and timeouts due to a clustered index scan. However another seemingly identical database did not have the same issue.
Turns out the COMPATIBILITY_LEVEL flag on the db was different... the version with COMPATIBILITY_LEVEL 100 was using the scan, the db with level 130 wasn't. Performance difference is huge (from more than 1 minute to less that 1 second for same query)
ALTER DATABASE [mydb] SET COMPATIBILITY_LEVEL = 130

If you hover over the step in the query plan, SSMS displays a description of what the step does. That will give you a baseline understanding of "Clustered Index Scan (Clustered)" and all other steps involved.

Optimizing my SQL queries - picking the right indexes

I have a basic table as follows.
create table Orders
(
ID INT IDENTITY(1,1) PRIMARY KEY,
Company VARCHAR(3),
ItemID INT,
BoxID INT,
OrderNum VARCHAR(5),
Status VARCHAR(5),
--about 10 more columns, varchars and ints and dates
)
I'm trying to optimize all my SQL since I am getting a fair few deadlocks and some slowness - but I'm no expert on this sort of thing!
I created a few indexes:
Clustered on the ID (Primary Key).
Non-Clustered index on ([ItemID])
Non-Clustered index on ([BoxID])
Non-Clustered index on ([Company],[OrderNum],[Status])
Maybe 1 or 2 more on some other columns
But I'm not 100% happy with the results.
SELECT * FROM Orders WHERE ItemID=100
Gives me an index seek + a key lookup and a Nested loop (Inner join).
I can see why - but don't know if I should do anything about it. They key lookup is 97% of the batch which seems bad!
Every query used will pull back every column in the table, but I don't like the idea of including every column in the index.
I'm making a change now to query everything on the [Company] field. Every query will be using it, because results should never contain more than 1 value. So they will all change:
SELECT * FROM Orders WHERE ItemID=100 --Old
SELECT * FROM Orders WHERE Company='a' and ItemID=100 --New
But the execution plan of that gives me exactly the same as not including company (which does surprise me!).
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Is it worth adding [Company] to all my indexes since it seems to make
0 different to the execution plan?
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that
mean every query will have 2 seeks?
Is it worth 'including' all other columns in my indexes to avoid the
key lookup? (making the index a tonne bigger, but potentially
speeding it up?) i.e.
CREATE NONCLUSTERED INDEX [IX_Orders_MyIndex] ON [Orders]
( [Company] ASC, [OrderNum] ASC, [Status] ASC )
INCLUDE ([ID],[ItemID],[BoxID],
[Column5],[Column6],[Column7],[Column8],[Column9],[Column10],etc)
That seems messy if I did it on 4 or 5 indexes.
Basically I have 4-5 queries which run quite often (some selects and updates) so I want to make it as efficient as possible.
All queries will use the [company] field, and at least 1 other. How should I go about it.
Any help appreciated :)

In your execution plan, you say that lookup takes 97% of the batch.
In this case it doesn't mean anything because an index seek is very fast and you didn't have that much operation to be done.
That lookup is actually the record you read based on the index you have specified.
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Non-Clustered index on ([Company],[OrderNum],[Status])
This index will be considered only if Company, OrderNum and Status appear in your where clause.
Concatenated indexes generates a key that would look like this 0000000000000 when you pass only company it creates an incomplete key that requires using wildcard for the other to values.
It would look a little like this : key like 'XXX%' this logic will require an index scan which is time consuming.
The optimizer will determine that it's preferable to first seek and rows from the ItemID index and then scan these to match any with the required company.
Is it worth adding [Company] to all my indexes since it seems to make 0 different to the execution plan?
You should consider having a Company index instead of adding it to all your indexes.
Composite index could speed things up by reducing the number of nested loops, but you have to think then thoroughly.
The order of the fields you add to such an index is very important, they should be ordered by uniqueness to allow a better seek. Also, you should never add a field that might not be used in a query.
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that mean every query will have 2 seeks?
Having more than one index seek is not all that bad, they are usually paralleled and only the result of both are matched together.
Is it worth 'including' all other columns in my indexes to avoid the key lookup? (making the index a tonne bigger, but potentially speeding it up?)
It is worth when it's only a few fields that could be optional in the where clause or when you have queries that select only those fields when you are using the specified index.
Last notes
All indexes are not equal, comparing string (varchar) is not the same as comparing numbers (integer, datetime, bytes, etc).
Also, keeping them clean helps a lot, if your indexes are fragmented, they will be next to useless in terms of performance gain.

Why am I getting a Clustered Index Scan when the column is indexed?

So, we have a table, InventoryListItems, that has several columns. Because we're going to be looking for rows at times based on a particlar column (g_list_id, a foreign key), we have that foreign key column placed into a non-clustered index we'll call MYINDEX.
So when I search for data like this:
-- fake data for example
DECLARE #ListId uniqueidentifier
SELECT #ListId = '7BCD0E9F-28D9-4F40-BD67-803005179B04'
SELECT *
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
I expected that it would use the MYINDEX index to find just the needed rows, and then look up the information in those rows. So not as good as just finding everything we need in the index itself, but still a big win over doing a full scan of the table.
But instead it seems that I'm still getting a clustered index scan. I can't figure out why that would happen.
If I do something like SELECTing only the values in the included columns of the index, it does what I would expect, an index seek, and just pulls everything from the index.
But if I SELECT *, why does it just bail on the index and do a scan when it seems like it would still benefit greatly from using it because it's referenced in the WHERE clause?

Since you're doing a SELECT * and thus you retrieve all columns, SQL Server's query optimizer may have decided it's easier and more efficient to just do a clustered index scan - since it needs to go to the clustered index leaf level to get all the columns anyway (and doing a seek first, and then a key lookup to actually get the whole data page, is quite an expensive operation - scan might just be more efficient in this setup).
I'm almost sure if you try
SELECT g_list_id
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
then there will be an index seek instead (since you're only retrieving a single column - not everything).
That's one of the reasons why I would recommend to be extra careful when using SELECT * .... - try to avoid it if ever possible.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas