Performance Tuning SQL - sql

I have the following sql. When I check the execution plan of this query we observe an index scan. How do I replace it with index seek. I have non clustered index on IdDeleted column.
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE BP.IsDeleted=0 or BP.IsDeleted is null
GROUP BY BP.BoundProjectId
I tried like this and got index seek, but the result was wrong.
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE BP.IsDeleted=0
GROUP BY BP.BoundProjectId
Can anyone kindly suggest me to get the right result set with index seek.
I mean how to I replace (BP.IsDeleted=0 or BP.IsDeleted is null) condition to make use of index seek.
Edit, added row counts from comments of one of the answers:
null: 254962 rows
0: 392002 rows
1: 50211 rows

You're not getting an index seek because you're fetching almost 93% of the rows in the table and in that kind of scenario, just scanning the whole index is faster and cheaper to do.
If you have performance issues, you should look into removing format() -function, especially if the query returns a lot of rows. Read more from this blog post
Other option might be to create an indexed view and pre-aggregate your data. This of course adds an overhead to update / insert operations, so consider that only if this is done really often vs. how often the table is updated.

have you tried a UNION ALL?
select ...
Where isdeleted = 0
UNION ALL
select ...
Where isdeleted is null
Or you can add an Query hint (index= [indexname])
also be aware that the cardinality will determine if SQL uses a seek or a scan - seeks are quicker but can require key lookups if the index doesnt cover all the columns required. A good rule of thumb is that if the query will return more than 5% of the table then SQL will prefer a scan.

Related

Does index still exists in these situations?

I have some questions about index.
First, if I use a index column in WITH clause, dose this column still works as index column in main query?
For example,
WITH TEST AS (
SELECT EMP_ID
FROM EMP_MASTER
)
SELECT *
FROM TEST
WHERE EMP_ID >= '2000'
'EMP_ID' in 'EMP_MASTER' table is PK and index for EMP_MASTER consists of EMP_ID.
In this situation, Does 'Index Scan' happen in main query?
Second, if I join two tables and then use two index columns from each table in WHERE, does 'Index Scan' happen?
For example,
SELECT *
FROM A, B
WHERE A.COL1 = B.COL1 AND
A.COL1 > 200 AND
B.COL1 > 100
Index for table A consists of 'COL1' and index for table B consists of 'COL1'.
In this situation, does 'Index Scan' happen in each table before table join?
If you give me some proper advice, I really appreciate that.
First, SQL is a declarative language, not a procedural language. That is, a SQL query describes the result set, not the specific processing. The SQL engine uses the optimizer to determine the best execution plan to generate the result set.
Second, Oracle has a reasonable optimizer.
Hence, Oracle will probably use the indexes in these situations. However, you should look at the execution plan to see what Oracle really does.
First, if I use a index column in WITH clause, dose this column still works as index column in main query?
Yes. A CTE (the WITH part) is a query just like any other - and if a query references a physical table column used by an index then the engine will use the index if it thinks it's a good idea.
In this situation, Does 'Index Scan' happen in main query?
We can't tell from the limitated information you've provided. An engine will scan or seek an index based on its heuristics about the distribution of data in the index (e.g. STATISTICS objects) and other information it has, such as cached query execution plans.
In this situation, does 'Index Scan' happen in each table before table join?
As it's a range query, it probably would make sense for an engine to use an index scan rather than an index seek - but it also could do a table-scan and ignore the index if the index isn't selective and specific enough. Also factor in query flags to force reading non-committed data (e.g. for performance and to avoid locking).

Index scan on SQL update statement

I have the following SQL statement, which I would like to make more efficient. Looking through the execution plan I can see that there is a Clustered Index Scan on #newWebcastEvents. Is there a way I can make this into a seek? Or are there any other ways I can make the below more efficient?
declare #newWebcastEvents table (
webcastEventId int not null primary key clustered (webcastEventId) with (ignore_dup_key=off)
)
insert into #newWebcastEvents
select wel.WebcastEventId
from WebcastChannelWebcastEventLink wel with (nolock)
where wel.WebcastChannelId = 1178
Update WebcastEvent
set WebcastEventTitle = LEFT(WebcastEventTitle, CHARINDEX('(CLONE)',WebcastEventTitle,0) - 2)
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)
The #newWebcastEvents table variable only contains the only single column, and you're asking for all rows of that table variable in this where clause:
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)
So doing a seek on this clustered index is usually rather pointless - SQL Server query optimizer will need all columns, all rows of that table variable anyway..... so it chooses an index scan.
I don't think this is a performance issue, anyway.
An index seek is useful if you need to pick a very small number of rows (<= 1-2% of the original number of rows) from a large table. Then, going through the clustered index navigation tree and finding those few rows involved makes a lot more sense than scanning the whole table. But here, with a single int column and 15 rows --> it's absolutely pointless to seek, it will be much faster to just read those 15 int values in a single scan and be done with it...
Update: no sure if it makes any difference in terms of performance, but I personally typically prefer to use joins rather than subselects for "connecting" two tables:
UPDATE we
SET we.WebcastEventTitle = LEFT(we.WebcastEventTitle, CHARINDEX('(CLONE)', we.WebcastEventTitle, 0) - 2)
FROM dbo.WebcastEvent we
INNER JOIN #newWebcastEvents nwe ON we.WebcastEventId = nwe.webcastEventId

Getting RID Lookup instead of Table Scan?

SQL Fiddle: http://sqlfiddle.com/#!3/23cf8
In this query, when I have an In clause on an Id, and then also select other columns, the In is evaluated first, and then the Details column and other columns are pulled in via a RID Lookup:
--In production and in SQL Fiddle, Details is grabbed via a RID Lookup after the In clause is evaluated
SELECT [Id]
,[ForeignId]
,Details
--Generate a numbering(starting at 1)
--,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where foreignId In (1,2,3,5)
With this query, the Details are being pulled in via a Table Scan.
With NumberedContacts AS
(
SELECT [Id]
,[ForeignId]
--Generate a numbering(starting at 1)
,Row_Number() Over(Partition By ForeignId Order By Id Desc) as ContactNumber --Desc because older posts should be numbered last
FROM SupportContacts
Where ForeignId In (1,2,3,5)
)
Select nc.[Id]
,nc.[ForeignId]
,sc.[Details]
From NumberedContacts nc
Inner Join SupportContacts sc on nc.Id = sc.Id
Where nc.ContactNumber <= 2 --Only grab the last 2 contacts per ForeignId
;
In SqlFiddle, the second query actually gets a RID Lookup, whereas in production with a million records it produces a Table Scan (the IN clause eliminates 99% of the rows)
Otherwise the query plan shown in SQL Fiddle is identical, the only difference being that for the second query the RID Lookup in SQL Fiddle, is a Table Scan in production :(
I would like to understand possibilities that would cause this behavior? What kinds of things would you look at to help determine the cause of it using a table scan here?
How can I influence it to use a RID Lookup there?
From looking at operation costs in the actual execution plan, I believe I can get the second query very close in performance to the first query if I can get it to use a RID Lookup. If I don't select the Detail column, then the performance of both queries is very close in production. It is only after adding other columns like Detail that performance degrades significantly for the second query. When I put it in SQL Fiddle and saw that the execution plan used an RID Lookup, I was surprised but slightly confused...
It doesn't have a clustered index because in testing with different clustered indexes, there was slightly worse performance for this and other queries. That was before I began adding other columns like Details though, and I can experiment with that more, but would like to have a understanding of what is going on now before I start shooting in the dark with random indexes.
What if you would change your main index to include the Details column?
If you use:
CREATE NONCLUSTERED INDEX [IX_SupportContacts_ForeignIdAsc_IdDesc]
ON SupportContacts ([ForeignId] ASC, [Id] DESC)
INCLUDE (Details);
then neither a RID lookup nor a table scan would be needed, since your query could be satisfied from just the index itself....
The differences in the query plans will be dependent on the types of indexes that exist and the statistics of the data for those tables in the different environments.
The optimiser uses the statistics (histograms of data frequency, mostly) and the available indexes to decide which execution plan is going to be the quickest.
So, for example, you have noticed that the performance degrades when the 'Details' column is included. This is an almost sure sign that either the 'Details' column is not part of an index, or if it is part of an index, the data in that column is mostly unique such that the index accesses would be equivalent (or almost equivalent) to a table scan.
Often when this situation arises, the optimiser will choose a table scan over the index access, as it can take advantage of things like block reads to access the table records faster than perhaps a fragmented read of an index.
To influence the path that will be chose by the optimiser, you would need to look at possible indexes that could be added/modified to make an index access more efficient, but this should be done with care as it can adversely affect other queries as well as possibly degrading insert performance.
The other important activity you can do to help the optimiser is to make sure the table statistics are kept up to date and refreshed at a frequency that is appropriate to the rate of change of the frequency distribution in the table data
If it's true that 99% of the rows would be omitted if it performed the query using the relevant index + RID then the likeliest problem in your production environment is that your statistics are out of date and the optimiser doesn't realise that ForeignID in (1,2,3,5) would limit the result set to 1% of the total data.
Here's a good link for discovering more about statistics from Pinal Dave: http://blog.sqlauthority.com/2010/01/25/sql-server-find-statistics-update-date-update-statistics/
As for forcing the optimiser to follow the correct path WITHOUT updating the statistics, you could use a table hint - if you know the index that your plan should be using which contains the ID and ForeignID columns then stick that in your query as a hint and force SQL optimiser to use the index:
http://msdn.microsoft.com/en-us/library/ms187373.aspx
FYI, if you want the best performance from your second query, use this index and avoid the headache you're experiencing altogether:
create index ix1 on SupportContacts(ForeignID, Id DESC) include (Details);

sql server query performance where clause

I got a table like with 10 000 lines.
declare #a table
(
id bigint not null,
nm varchar(100) not null,
filter bigint
primary key (id)
)
A select, with 4-5 join, is taking x seconds. If a where clause is added, it's now taking 3x seconds. The where clause:
filter = #filder or
filter is null
I applied a nonclustered index on the column, but I'm getting only 10% on perfomance.
Any tips?
edit: the perfomance issue happens when the filter column is added. all joins are on primary keys.
I have a few thoughts on this:
Chances are that your joins are joining on table.id - which is a primary key and has an index - bingo - high selectivity (because the values are unique). With it being indexed the optimizer can really optimize access to this table when it is used in joins.
I'm not 100% sure but - either you do not have an index on filter or it is not selective enough. If you do not have an index - the optimizer will use a table scan. If you do have an index, but it is not selective enough, it will use a table scan anyways. Scans are expensive.
Even if you do have an index on filter The optimizer does not like OR predicates. Basically when using an OR the optimizer might end up using an index scan instead of an index seek. Try using this instead: #filder = ISNULL(Filter, #filder as #sut13 suggested.
So to improve the performance: add an index on filter if you do not have one and adjust your where clause to not use OR as I have suggested.
Also:
You shouldn't expect the query with the where filter to perform equal to or better than the query with 4-5 joins. If the query with the joins is more selective and makes better use of indexes it is going to perform better
It's likely that the lack of index (based on what you described on the structure) on the filter column is leading to a table scan. The only way to be sure is to take a look at the execution plan for the query. That will tell you what the optimizer is doing with the query and usually give you enough information so you can understand why it's doing that and what you need to do to fix it.
Probably, you need an index on the filter column. But, with the 'OR filter IS NULL' might lead to a scan anyway, depending on how many null values are in the data.
If you use the ISNULL as outlined, unfortunately, that's a function on the column and will probably (depending on the indexes used and other columns in the WHERE clause that can be used to initially filter the data, etc.) result in a scan and not a seek.

Does indexes work with group function in oracle?

I am running following query.
SELECT Table_1.Field_1,
Table_1.Field_2,
SUM(Table_1.Field_5) BALANCE_AMOUNT
FROM Table_1, Table_2
WHERE Table_1.Field_3 NOT IN (1, 3)
AND Table_2.Field_2 <> 2
AND Table_2.Field_3 = 'Y'
AND Table_1.Field_1 = Table_2.Field_1
AND Table_1.Field_4 = '31-oct-2011'
GROUP BY Table_1.Field_1, Table_1.Field_2;
I have created index for columns (Field_1,Field_2,Field_3,Field_4) of Table_1 but the index is not getting used.
If I remove the SUM(Table_1.Field_5) from select clause then index is getting used.
I am confused if optimizer is not using this index or its because of SUM() function I have used in query.
Please share your explaination on the same.
When you remove the SUM you also remove field_5 from the query. All the data needed to answer the query can then be found in the index, which may be quicker than scanning the table. If you added field_5 to the index the query with SUM might use the index.
If your query is returning the large percentage of table's rows, Oracle may decide that doing a full table scan is cheaper than "hopping" between the index and the table's heap (to get the values in Table_1.Field_5).
Try adding Table_1.Field_5 to the index (thus covering the whole query with the index) and see if this helps.
See the Index-Only Scan: Avoiding Table Access at Use The Index Luke for conceptual explanation of what is going on.
As you mentioned, the presence of the summation function results in the the Index being overlooked.
There are function based indexes:
A function-based index includes columns that are either transformed by a function, such as the UPPER function, or included in an expression, such as col1 + col2.
Defining a function-based index on the transformed column or expression allows that data to be returned using the index when that function or expression is used in a WHERE clause or an ORDER BY clause. Therefore, a function-based index can be beneficial when frequently-executed SQL statements include transformed columns, or columns in expressions, in a WHERE or ORDER BY clause.
However, as with all, function based indexes have their restrictions:
Expressions in a function-based index cannot contain any aggregate functions. The expressions must reference only columns in a row in the table.
Though I see some good answers here couple of important points are being missed -
SELECT Table_1.Field_1,
Table_1.Field_2,
SUM(Table_1.Field_5) BALANCE_AMOUNT
FROM Table_1, Table_2
WHERE Table_1.Field_3 NOT IN (1, 3)
AND Table_2.Field_2 <> 2
AND Table_2.Field_3 = 'Y'
AND Table_1.Field_1 = Table_2.Field_1
AND Table_1.Field_4 = '31-oct-2011'
GROUP BY Table_1.Field_1, Table_1.Field_2;
Saying that having SUM(Table_1.Field_5) in select clause causes index not to be used in not correct. Your index on (Field_1,Field_2,Field_3,Field_4) can still be used. But there are problems with your index and sql query.
Since your index is only on (Field_1,Field_2,Field_3,Field_4) even if your index gets used DB will have to access the actual table row to fetch Field_5 for applying filter. Now it completely depends on the execution plan charted out of sql optimizer which one is cost effective. If SQL optimizer figures out that full table scan has less cost than using index it will ignore the index. Saying so I will now tell you probable problems with your index -
As others have states you could simply add Field_5 to the index so that there is no need for separate table access.
Your order of index matters very much for performance. For eg. in your case if you give order as (Field_4,Field_1,Field_2,Field_3) then it will be quicker since you have equality on Field_4 -Table_1.Field_4 = '31-oct-2011'. Think of it this was -
Table_1.Field_4 = '31-oct-2011' will give you less options to choose final result from then Table_1.Field_3 NOT IN (1, 3). Things might change since you are doing a join. It's always best to see the execution plan and design your index/sql accordingly.