sql server query performance where clause - sql

I got a table like with 10 000 lines.
declare #a table
(
id bigint not null,
nm varchar(100) not null,
filter bigint
primary key (id)
)
A select, with 4-5 join, is taking x seconds. If a where clause is added, it's now taking 3x seconds. The where clause:
filter = #filder or
filter is null
I applied a nonclustered index on the column, but I'm getting only 10% on perfomance.
Any tips?
edit: the perfomance issue happens when the filter column is added. all joins are on primary keys.

I have a few thoughts on this:
Chances are that your joins are joining on table.id - which is a primary key and has an index - bingo - high selectivity (because the values are unique). With it being indexed the optimizer can really optimize access to this table when it is used in joins.
I'm not 100% sure but - either you do not have an index on filter or it is not selective enough. If you do not have an index - the optimizer will use a table scan. If you do have an index, but it is not selective enough, it will use a table scan anyways. Scans are expensive.
Even if you do have an index on filter The optimizer does not like OR predicates. Basically when using an OR the optimizer might end up using an index scan instead of an index seek. Try using this instead: #filder = ISNULL(Filter, #filder as #sut13 suggested.
So to improve the performance: add an index on filter if you do not have one and adjust your where clause to not use OR as I have suggested.
Also:
You shouldn't expect the query with the where filter to perform equal to or better than the query with 4-5 joins. If the query with the joins is more selective and makes better use of indexes it is going to perform better

It's likely that the lack of index (based on what you described on the structure) on the filter column is leading to a table scan. The only way to be sure is to take a look at the execution plan for the query. That will tell you what the optimizer is doing with the query and usually give you enough information so you can understand why it's doing that and what you need to do to fix it.
Probably, you need an index on the filter column. But, with the 'OR filter IS NULL' might lead to a scan anyway, depending on how many null values are in the data.
If you use the ISNULL as outlined, unfortunately, that's a function on the column and will probably (depending on the indexes used and other columns in the WHERE clause that can be used to initially filter the data, etc.) result in a scan and not a seek.

Related

Performance Tuning SQL

I have the following sql. When I check the execution plan of this query we observe an index scan. How do I replace it with index seek. I have non clustered index on IdDeleted column.
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE BP.IsDeleted=0 or BP.IsDeleted is null
GROUP BY BP.BoundProjectId
I tried like this and got index seek, but the result was wrong.
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE BP.IsDeleted=0
GROUP BY BP.BoundProjectId
Can anyone kindly suggest me to get the right result set with index seek.
I mean how to I replace (BP.IsDeleted=0 or BP.IsDeleted is null) condition to make use of index seek.
Edit, added row counts from comments of one of the answers:
null: 254962 rows
0: 392002 rows
1: 50211 rows
You're not getting an index seek because you're fetching almost 93% of the rows in the table and in that kind of scenario, just scanning the whole index is faster and cheaper to do.
If you have performance issues, you should look into removing format() -function, especially if the query returns a lot of rows. Read more from this blog post
Other option might be to create an indexed view and pre-aggregate your data. This of course adds an overhead to update / insert operations, so consider that only if this is done really often vs. how often the table is updated.
have you tried a UNION ALL?
select ...
Where isdeleted = 0
UNION ALL
select ...
Where isdeleted is null
Or you can add an Query hint (index= [indexname])
also be aware that the cardinality will determine if SQL uses a seek or a scan - seeks are quicker but can require key lookups if the index doesnt cover all the columns required. A good rule of thumb is that if the query will return more than 5% of the table then SQL will prefer a scan.

Index scan on SQL update statement

I have the following SQL statement, which I would like to make more efficient. Looking through the execution plan I can see that there is a Clustered Index Scan on #newWebcastEvents. Is there a way I can make this into a seek? Or are there any other ways I can make the below more efficient?
declare #newWebcastEvents table (
webcastEventId int not null primary key clustered (webcastEventId) with (ignore_dup_key=off)
)
insert into #newWebcastEvents
select wel.WebcastEventId
from WebcastChannelWebcastEventLink wel with (nolock)
where wel.WebcastChannelId = 1178
Update WebcastEvent
set WebcastEventTitle = LEFT(WebcastEventTitle, CHARINDEX('(CLONE)',WebcastEventTitle,0) - 2)
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)
The #newWebcastEvents table variable only contains the only single column, and you're asking for all rows of that table variable in this where clause:
where
WebcastEvent.WebcastEventId in (select webcastEventId from #newWebcastEvents)
So doing a seek on this clustered index is usually rather pointless - SQL Server query optimizer will need all columns, all rows of that table variable anyway..... so it chooses an index scan.
I don't think this is a performance issue, anyway.
An index seek is useful if you need to pick a very small number of rows (<= 1-2% of the original number of rows) from a large table. Then, going through the clustered index navigation tree and finding those few rows involved makes a lot more sense than scanning the whole table. But here, with a single int column and 15 rows --> it's absolutely pointless to seek, it will be much faster to just read those 15 int values in a single scan and be done with it...
Update: no sure if it makes any difference in terms of performance, but I personally typically prefer to use joins rather than subselects for "connecting" two tables:
UPDATE we
SET we.WebcastEventTitle = LEFT(we.WebcastEventTitle, CHARINDEX('(CLONE)', we.WebcastEventTitle, 0) - 2)
FROM dbo.WebcastEvent we
INNER JOIN #newWebcastEvents nwe ON we.WebcastEventId = nwe.webcastEventId

Why am I getting a Clustered Index Scan when the column is indexed?

So, we have a table, InventoryListItems, that has several columns. Because we're going to be looking for rows at times based on a particlar column (g_list_id, a foreign key), we have that foreign key column placed into a non-clustered index we'll call MYINDEX.
So when I search for data like this:
-- fake data for example
DECLARE #ListId uniqueidentifier
SELECT #ListId = '7BCD0E9F-28D9-4F40-BD67-803005179B04'
SELECT *
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
I expected that it would use the MYINDEX index to find just the needed rows, and then look up the information in those rows. So not as good as just finding everything we need in the index itself, but still a big win over doing a full scan of the table.
But instead it seems that I'm still getting a clustered index scan. I can't figure out why that would happen.
If I do something like SELECTing only the values in the included columns of the index, it does what I would expect, an index seek, and just pulls everything from the index.
But if I SELECT *, why does it just bail on the index and do a scan when it seems like it would still benefit greatly from using it because it's referenced in the WHERE clause?
Since you're doing a SELECT * and thus you retrieve all columns, SQL Server's query optimizer may have decided it's easier and more efficient to just do a clustered index scan - since it needs to go to the clustered index leaf level to get all the columns anyway (and doing a seek first, and then a key lookup to actually get the whole data page, is quite an expensive operation - scan might just be more efficient in this setup).
I'm almost sure if you try
SELECT g_list_id
FROM [dbo].[InventoryListItems]
WHERE [g_list_id] = #ListId
then there will be an index seek instead (since you're only retrieving a single column - not everything).
That's one of the reasons why I would recommend to be extra careful when using SELECT * .... - try to avoid it if ever possible.

Does indexes work with group function in oracle?

I am running following query.
SELECT Table_1.Field_1,
Table_1.Field_2,
SUM(Table_1.Field_5) BALANCE_AMOUNT
FROM Table_1, Table_2
WHERE Table_1.Field_3 NOT IN (1, 3)
AND Table_2.Field_2 <> 2
AND Table_2.Field_3 = 'Y'
AND Table_1.Field_1 = Table_2.Field_1
AND Table_1.Field_4 = '31-oct-2011'
GROUP BY Table_1.Field_1, Table_1.Field_2;
I have created index for columns (Field_1,Field_2,Field_3,Field_4) of Table_1 but the index is not getting used.
If I remove the SUM(Table_1.Field_5) from select clause then index is getting used.
I am confused if optimizer is not using this index or its because of SUM() function I have used in query.
Please share your explaination on the same.
When you remove the SUM you also remove field_5 from the query. All the data needed to answer the query can then be found in the index, which may be quicker than scanning the table. If you added field_5 to the index the query with SUM might use the index.
If your query is returning the large percentage of table's rows, Oracle may decide that doing a full table scan is cheaper than "hopping" between the index and the table's heap (to get the values in Table_1.Field_5).
Try adding Table_1.Field_5 to the index (thus covering the whole query with the index) and see if this helps.
See the Index-Only Scan: Avoiding Table Access at Use The Index Luke for conceptual explanation of what is going on.
As you mentioned, the presence of the summation function results in the the Index being overlooked.
There are function based indexes:
A function-based index includes columns that are either transformed by a function, such as the UPPER function, or included in an expression, such as col1 + col2.
Defining a function-based index on the transformed column or expression allows that data to be returned using the index when that function or expression is used in a WHERE clause or an ORDER BY clause. Therefore, a function-based index can be beneficial when frequently-executed SQL statements include transformed columns, or columns in expressions, in a WHERE or ORDER BY clause.
However, as with all, function based indexes have their restrictions:
Expressions in a function-based index cannot contain any aggregate functions. The expressions must reference only columns in a row in the table.
Though I see some good answers here couple of important points are being missed -
SELECT Table_1.Field_1,
Table_1.Field_2,
SUM(Table_1.Field_5) BALANCE_AMOUNT
FROM Table_1, Table_2
WHERE Table_1.Field_3 NOT IN (1, 3)
AND Table_2.Field_2 <> 2
AND Table_2.Field_3 = 'Y'
AND Table_1.Field_1 = Table_2.Field_1
AND Table_1.Field_4 = '31-oct-2011'
GROUP BY Table_1.Field_1, Table_1.Field_2;
Saying that having SUM(Table_1.Field_5) in select clause causes index not to be used in not correct. Your index on (Field_1,Field_2,Field_3,Field_4) can still be used. But there are problems with your index and sql query.
Since your index is only on (Field_1,Field_2,Field_3,Field_4) even if your index gets used DB will have to access the actual table row to fetch Field_5 for applying filter. Now it completely depends on the execution plan charted out of sql optimizer which one is cost effective. If SQL optimizer figures out that full table scan has less cost than using index it will ignore the index. Saying so I will now tell you probable problems with your index -
As others have states you could simply add Field_5 to the index so that there is no need for separate table access.
Your order of index matters very much for performance. For eg. in your case if you give order as (Field_4,Field_1,Field_2,Field_3) then it will be quicker since you have equality on Field_4 -Table_1.Field_4 = '31-oct-2011'. Think of it this was -
Table_1.Field_4 = '31-oct-2011' will give you less options to choose final result from then Table_1.Field_3 NOT IN (1, 3). Things might change since you are doing a join. It's always best to see the execution plan and design your index/sql accordingly.

is count(indexed column) faster than count(*)? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Performance of COUNT SQL function
Hi all,
I've very large tables and I need to know number of records in each , My question is does it reduce the run time if I run :
select count(indexed column like my PK) from tbTest
instead of
select count(*) from tbTest
see Performance of COUNT SQL function
The important thing to note is they are not equivalent
Since the question is whether or not there is a performance difference, it would depend on the index. When you do COUNT(*), it will use the PK column(s) to determine the number of rows. If you do not have any indexes besides a clustered index on the PK column(s), it will scan the leaf nodes on the clustered index. That's probably a lot of pages. If you have a non clustered index that is skinnier than the clustered index, it will choose that instead, resulting in less reads.
So, if the column you select is contained in the smallest possible non-clustered index on the table, the SQL query optimizer will choose that for both count() (if you have a clustered ix that is the PK) and count(indexed_column). If you choose a count(indexed_col) that is only contained in a wide index, then the count() will be faster if your PK is a clustered index. The reason this works is that there is a pointer to the clustered index in all non-clustered indexes and SQL Server can figure out the number of rows based on that non-clustered index.
So, as usual in SQL Server, it depends. Do a showplan and compare the queries to each other.
SELECT COUNT(*) may be faster. That is because using * gives the optimizer liberty to choose any column to count on. Say you have a primary key on a INT column, and a non clustered key on a different bigint column. But the primary key is likely the clustered index, and as such it is in fact significantly larger than the nonclustered bigint index (has more pages). So if the optimizer is free to choose the bigint non-clustered index, it can return the response faster. Possible much faster, depending on the table.
So overall is always better to leave it as COUNT(*) and let the optimizer choose.
most likely, if the query scans the index instead of the whole table.
it is an easy thing to test, become your own scientist.
Both are identical. If you look at the query execution plan for both, both will do an "index scan"