Index scan on SQL update statement - sql

I have the following SQL statement, which I would like to make more efficient. Looking through the execution plan I can see that there is a Clustered Index Scan on #newWebcastEvents. Is there a way I can make this into a seek? Or are there any other ways I can make the below more efficient?
declare #newWebcastEvents table (
webcastEventId int not null primary key clustered (webcastEventId) with (ignore_dup_key=off)
)
insert into #newWebcastEvents
select wel.WebcastEventId
from WebcastChannelWebcastEventLink wel with (nolock)
where wel.WebcastChannelId = 1178
Update WebcastEvent
set WebcastEventTitle = LEFT(WebcastEventTitle, CHARINDEX('(CLONE)',WebcastEventTitle,0) - 2)
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)

The #newWebcastEvents table variable only contains the only single column, and you're asking for all rows of that table variable in this where clause:
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)
So doing a seek on this clustered index is usually rather pointless - SQL Server query optimizer will need all columns, all rows of that table variable anyway..... so it chooses an index scan.
I don't think this is a performance issue, anyway.
An index seek is useful if you need to pick a very small number of rows (<= 1-2% of the original number of rows) from a large table. Then, going through the clustered index navigation tree and finding those few rows involved makes a lot more sense than scanning the whole table. But here, with a single int column and 15 rows --> it's absolutely pointless to seek, it will be much faster to just read those 15 int values in a single scan and be done with it...
Update: no sure if it makes any difference in terms of performance, but I personally typically prefer to use joins rather than subselects for "connecting" two tables:
UPDATE we
SET we.WebcastEventTitle = LEFT(we.WebcastEventTitle, CHARINDEX('(CLONE)', we.WebcastEventTitle, 0) - 2)
FROM dbo.WebcastEvent we
INNER JOIN #newWebcastEvents nwe ON we.WebcastEventId = nwe.webcastEventId

Related

Performance Tuning SQL

I have the following sql. When I check the execution plan of this query we observe an index scan. How do I replace it with index seek. I have non clustered index on IdDeleted column.
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE BP.IsDeleted=0 or BP.IsDeleted is null
GROUP BY BP.BoundProjectId
I tried like this and got index seek, but the result was wrong.
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE BP.IsDeleted=0
GROUP BY BP.BoundProjectId
Can anyone kindly suggest me to get the right result set with index seek.
I mean how to I replace (BP.IsDeleted=0 or BP.IsDeleted is null) condition to make use of index seek.
Edit, added row counts from comments of one of the answers:
null: 254962 rows
0: 392002 rows
1: 50211 rows
You're not getting an index seek because you're fetching almost 93% of the rows in the table and in that kind of scenario, just scanning the whole index is faster and cheaper to do.
If you have performance issues, you should look into removing format() -function, especially if the query returns a lot of rows. Read more from this blog post
Other option might be to create an indexed view and pre-aggregate your data. This of course adds an overhead to update / insert operations, so consider that only if this is done really often vs. how often the table is updated.
have you tried a UNION ALL?
select ...
Where isdeleted = 0
UNION ALL
select ...
Where isdeleted is null
Or you can add an Query hint (index= [indexname])
also be aware that the cardinality will determine if SQL uses a seek or a scan - seeks are quicker but can require key lookups if the index doesnt cover all the columns required. A good rule of thumb is that if the query will return more than 5% of the table then SQL will prefer a scan.

Optimizing my SQL queries - picking the right indexes

I have a basic table as follows.
create table Orders
(
ID INT IDENTITY(1,1) PRIMARY KEY,
Company VARCHAR(3),
ItemID INT,
BoxID INT,
OrderNum VARCHAR(5),
Status VARCHAR(5),
--about 10 more columns, varchars and ints and dates
)
I'm trying to optimize all my SQL since I am getting a fair few deadlocks and some slowness - but I'm no expert on this sort of thing!
I created a few indexes:
Clustered on the ID (Primary Key).
Non-Clustered index on ([ItemID])
Non-Clustered index on ([BoxID])
Non-Clustered index on ([Company],[OrderNum],[Status])
Maybe 1 or 2 more on some other columns
But I'm not 100% happy with the results.
SELECT * FROM Orders WHERE ItemID=100
Gives me an index seek + a key lookup and a Nested loop (Inner join).
I can see why - but don't know if I should do anything about it. They key lookup is 97% of the batch which seems bad!
Every query used will pull back every column in the table, but I don't like the idea of including every column in the index.
I'm making a change now to query everything on the [Company] field. Every query will be using it, because results should never contain more than 1 value. So they will all change:
SELECT * FROM Orders WHERE ItemID=100 --Old
SELECT * FROM Orders WHERE Company='a' and ItemID=100 --New
But the execution plan of that gives me exactly the same as not including company (which does surprise me!).
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Is it worth adding [Company] to all my indexes since it seems to make
0 different to the execution plan?
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that
mean every query will have 2 seeks?
Is it worth 'including' all other columns in my indexes to avoid the
key lookup? (making the index a tonne bigger, but potentially
speeding it up?) i.e.
CREATE NONCLUSTERED INDEX [IX_Orders_MyIndex] ON [Orders]
( [Company] ASC, [OrderNum] ASC, [Status] ASC )
INCLUDE ([ID],[ItemID],[BoxID],
[Column5],[Column6],[Column7],[Column8],[Column9],[Column10],etc)
That seems messy if I did it on 4 or 5 indexes.
Basically I have 4-5 queries which run quite often (some selects and updates) so I want to make it as efficient as possible.
All queries will use the [company] field, and at least 1 other. How should I go about it.
Any help appreciated :)
In your execution plan, you say that lookup takes 97% of the batch.
In this case it doesn't mean anything because an index seek is very fast and you didn't have that much operation to be done.
That lookup is actually the record you read based on the index you have specified.
Why are the two execution plans above the same? (I have no index on [company] at the moment)
Non-Clustered index on ([Company],[OrderNum],[Status])
This index will be considered only if Company, OrderNum and Status appear in your where clause.
Concatenated indexes generates a key that would look like this 0000000000000 when you pass only company it creates an incomplete key that requires using wildcard for the other to values.
It would look a little like this : key like 'XXX%' this logic will require an index scan which is time consuming.
The optimizer will determine that it's preferable to first seek and rows from the ItemID index and then scan these to match any with the required company.
Is it worth adding [Company] to all my indexes since it seems to make 0 different to the execution plan?
You should consider having a Company index instead of adding it to all your indexes.
Composite index could speed things up by reducing the number of nested loops, but you have to think then thoroughly.
The order of the fields you add to such an index is very important, they should be ordered by uniqueness to allow a better seek. Also, you should never add a field that might not be used in a query.
Should I instead just add 1 single index to [Company] and keep the original indexes? - but will that mean every query will have 2 seeks?
Having more than one index seek is not all that bad, they are usually paralleled and only the result of both are matched together.
Is it worth 'including' all other columns in my indexes to avoid the key lookup? (making the index a tonne bigger, but potentially speeding it up?)
It is worth when it's only a few fields that could be optional in the where clause or when you have queries that select only those fields when you are using the specified index.
Last notes
All indexes are not equal, comparing string (varchar) is not the same as comparing numbers (integer, datetime, bytes, etc).
Also, keeping them clean helps a lot, if your indexes are fragmented, they will be next to useless in terms of performance gain.

Why am I getting a Clustered Index Scan when the column is indexed?

So, we have a table, InventoryListItems, that has several columns. Because we're going to be looking for rows at times based on a particlar column (g_list_id, a foreign key), we have that foreign key column placed into a non-clustered index we'll call MYINDEX.
So when I search for data like this:
-- fake data for example
DECLARE #ListId uniqueidentifier
SELECT #ListId = '7BCD0E9F-28D9-4F40-BD67-803005179B04'
SELECT *
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
I expected that it would use the MYINDEX index to find just the needed rows, and then look up the information in those rows. So not as good as just finding everything we need in the index itself, but still a big win over doing a full scan of the table.
But instead it seems that I'm still getting a clustered index scan. I can't figure out why that would happen.
If I do something like SELECTing only the values in the included columns of the index, it does what I would expect, an index seek, and just pulls everything from the index.
But if I SELECT *, why does it just bail on the index and do a scan when it seems like it would still benefit greatly from using it because it's referenced in the WHERE clause?
Since you're doing a SELECT * and thus you retrieve all columns, SQL Server's query optimizer may have decided it's easier and more efficient to just do a clustered index scan - since it needs to go to the clustered index leaf level to get all the columns anyway (and doing a seek first, and then a key lookup to actually get the whole data page, is quite an expensive operation - scan might just be more efficient in this setup).
I'm almost sure if you try
SELECT g_list_id
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
then there will be an index seek instead (since you're only retrieving a single column - not everything).
That's one of the reasons why I would recommend to be extra careful when using SELECT * .... - try to avoid it if ever possible.

sql server query performance where clause

I got a table like with 10 000 lines.
declare #a table
(
id bigint not null,
nm varchar(100) not null,
filter bigint
primary key (id)
)
A select, with 4-5 join, is taking x seconds. If a where clause is added, it's now taking 3x seconds. The where clause:
filter = #filder or
filter is null
I applied a nonclustered index on the column, but I'm getting only 10% on perfomance.
Any tips?
edit: the perfomance issue happens when the filter column is added. all joins are on primary keys.
I have a few thoughts on this:
Chances are that your joins are joining on table.id - which is a primary key and has an index - bingo - high selectivity (because the values are unique). With it being indexed the optimizer can really optimize access to this table when it is used in joins.
I'm not 100% sure but - either you do not have an index on filter or it is not selective enough. If you do not have an index - the optimizer will use a table scan. If you do have an index, but it is not selective enough, it will use a table scan anyways. Scans are expensive.
Even if you do have an index on filter The optimizer does not like OR predicates. Basically when using an OR the optimizer might end up using an index scan instead of an index seek. Try using this instead: #filder = ISNULL(Filter, #filder as #sut13 suggested.
So to improve the performance: add an index on filter if you do not have one and adjust your where clause to not use OR as I have suggested.
Also:
You shouldn't expect the query with the where filter to perform equal to or better than the query with 4-5 joins. If the query with the joins is more selective and makes better use of indexes it is going to perform better
It's likely that the lack of index (based on what you described on the structure) on the filter column is leading to a table scan. The only way to be sure is to take a look at the execution plan for the query. That will tell you what the optimizer is doing with the query and usually give you enough information so you can understand why it's doing that and what you need to do to fix it.
Probably, you need an index on the filter column. But, with the 'OR filter IS NULL' might lead to a scan anyway, depending on how many null values are in the data.
If you use the ISNULL as outlined, unfortunately, that's a function on the column and will probably (depending on the indexes used and other columns in the WHERE clause that can be used to initially filter the data, etc.) result in a scan and not a seek.

SQL Server 2005 performance issue with DISTINCT

I have a table tblStkMst2 which has 87 columns and 53,000 rows. If I execute the following query it takes 83 to 96 milliseconds (Core2 Duo, 2.8 GHz, 2 GB of RAM). But when I use a distinct keyword it takes 1086 to 1103 milliseconds (more than 1 second). It is really expensive. If I apply duplicate removal algorithm on 53,000 rows of data it does not take 1 seconds.
Is there any other way in SQL Server 2005 to improve execution time?
declare #monthOnly int set #monthOnly = 12
declare #yearOnly int set #yearOnly = 2011
SELECT --(distinct)--
tblSModelMst.SMNo as [ModelID]
,tblSModelMst.Vehicle as [ModelName]
FROM tblStkMst2
INNER JOIN tblDCDetail ON tblStkMst2.DCNo = tblDCDetail.DCNo AND tblDCDetail.Refund=0
INNER JOIN tblSModelMst ON tblStkMst2.SMno = tblSModelMst.SMNo
INNER JOIN tblBuyerMst ON tblDCDetail.BNo = tblBuyerMst.BNo
LEFT OUTER JOIN tblSModelSegment ON tblSModelMst.SMSeg = tblSModelSegment.ID
left outer JOIN dbo.tblProdManager as pd ON pd.PMID = tblBuyerMst.PMId
WHERE (pd.Active = 1) AND ((tblStkMst2.ISSFlg = 1) or (tblStkMst2.IsBooked = 1))
AND (MONTH(tblStkMst2.SIssDate) = #monthOnly) AND (YEAR(tblStkMst2.SIssDate) = #yearOnly)
It is not that DISTINCT is very expensive (this is only 53000 rows, which is tiny). You are seeing a significant performance difference because SQL server is choosing a completely different query plan when you add DISTINCT. Without seeing the query plans it is very difficult to see what is happening.
There are a couple of things in your query though which you could do better which could significantly improve performance.
(1) Avoid where clauses like this where you need to transform a column:
AND (MONTH(tblStkMst2.SIssDate) = #monthOnly) AND (YEAR(tblStkMst2.SIssDate) = #yearOnly)
If you have an index on the SIssDate column SQL Server won't be able to use it (it will likely do a table scan as I suspect it won't be able to use another index).
If you want to take advantage of the SIssDate index, it is better if you try and convert the #monthOnly/#yearonly parameters into a min and max date and use these in the query:
AND (tblStkMst2.SIssDate between #minDate and #maxDate);
If you have a surrogate primary key (which is the clustered index) on the table, it may be useful to do this before you run your query (assuming your surrogate primary key is called tblStkMst2_id)
SELECT #minId=MIN(tblStkMst2_id), #maxId=(tblStkMst2_id)
FROM
tblStkMst2 WHERE tblStkMsg2.SIssDate between #minDate and #maxDate;
This should be very fast as SQL server should not even need to look at the table (just at the SIssDate non-clustered index and the tblStkMst2_id clustered index).
Then you can do this in your main query (instead of the date check):
AND (tblStkMst2.tblStkMst2_id BETWEEN #minId and #maxId);
Using the clustered index is much faster than using a non-clustered index as the DB will be able to sequentially access these records (rather than going through the non-clustered index redirect).
(2) Delay the join to tblStkMst2 until after you do the DISTINCT (or GROUP BY). The fewer entries in the DISTINCT (GROUP BY) the better.
SQL Server optimizes to avoid worst-case execution. This can lead it to prefer a suboptimal algorithm, like preferring a disk sort over a hash sort, just to be on the safe side.
For a limited number of distinct values, a hash sort is the fastest way to execute a distinct operation. A hash sort trades memory for execution speed. But if you have a large number of values, the hash sort breaks down because the hash is too large to store in memory. So you need a way to tell SQL Server that the hash will fit into memory.
One possible way to do that is to use a temporary table:
declare #t (ModelID int, ModelName varchar(50))
insert #t (ModelID, ModelName) select ...your original query here...
select distinct ModelID, ModelName from #t
SQL Server will know the size of the temporary table, allowing it to choose a better algorithm in many cases.
Several ways.
1 - Don't use DISTINCT
2 - Create an index on TblSModelMst(SMNo) INCLUDE (Vehicle), and index your other JOIN keys.
You really should figure out why you get duplicates and take care of that first. It's likely additional matching rows in one or more of your JOINed tables.
DISTINCT has it's place but is heavily overused to obscure data issues, and it's a very expensive operator, especially when you have a large number of rows you are filtering down from.
To get a more complete answer you need to explain your data structure and what you are trying to achieve.