SQL query optimization. Estimated vs actual number of rows

SQL query optimization. Estimated vs actual number of rows - sql

I'm dealing with a query that is taking very long to execute, after taking the query execution plan and live query statistics, I found that there is a huge difference between the estimated and actual number of rows. Why could this happend?
Need help to optimice that query.
Salutes.

RE: "Need help to optimize that query."
Query plan shows 2 clustered index scans. If those are large data tables, that could be a very big slow down.
Query plan also shows a recommended missing index to be created.
Start with creating the recommended index and see if DRIVER_ALLOCATIONS clustered index scan converts to a seek. My guess is that -- after the recommended index is added -- the next query plan will show another missing index for the other clustered index scan.

Related

SQL Server non-clustered index and the query optimizer

In one of the projects I am working on, there is a table which has about one million records. For better performance I created a non-clustered index and defined sid field as index key column. When I execute this query
SELECT [id]
,[sid]
,[idm]
,[origin]
,[status]
,[pid]
FROM [EpollText_Db].[dbo].[PhoneNumbers] where sid = 9
The execution plan is like the above picture. My question is, why does SQL server ignore the sid index and scan the whole one million records instead, to find the query result. Your help is greatly appreciated

I believe that the problem is in the size of your result. You are selecting ten thousand records from your database which is quite a lot if you consider the necessary query plan that would include index seek operation. The plan includes index seek would be something like this
Therefore, ten thousand key lookups would be included and a significant number of random logical accesses. Due to this, if your table row is small, he could decide to use clustered index scan. If you are really concerned about the performance of this query create a covering index:
CREATE INDEX idx_PhoneNumbers_sid
ON [EpollText_Db].[dbo].[PhoneNumbers](sid)
INCLUDE ([id],[idm],[origin],[status],[pid])
However, this may slow down inserts, deletes, and updates, and it may also double the size of your table.

Optimizing a query that uses index seek based on wrong estimates

I am joining two tables called Zasilka and Kapitola. Each one has a clustered index and Kapitola has also non clustered index with a column with which I am joining.
The query uses index seek because it expects only 1 row to be returned.
The statistics on both tables are updated.
I have tried to disable the index, then it uses merge join but it has to first order about 40000 rows, which takes a lot of resources.
Index column is mostly ordered but there are some cases when not. I just try to think about what would be the best strategy to join these tables and avoid order or seek.
And I do not know why it does not use non clustered index to join using merge.
Exectuion plan
io statistics seek
io statistics merge

You are misreading the information in the showplan, I believe. The estimate is per execution of the subtree. It estimates it will return 1 row per subtree execution and that it will execute the subtree 71,000 times. (It doesn't estimate less than one). Due to the containment assumption, it believes it will find a row when seeking (assumption of the optimizer based on usual customer behavior). In actuality, you get 46,000 rows or so back. So, the optimizer is working as expected in this case.
In the future, please post the query text, schema, and whole plan shape. It is very hard to do more than guess when you take a screenshot with most of the plan shape covered up.

How to reduce Nonclustered index scan cost

I have a query that is running on MS Access and the NonClustered scan in the execution plan has a cost of 100% in our SSMS. This query has the second highes processing time, which means this reduces our applications speed when this is executed. I am wondering if anyone knows what i can do to reduce the cost% for this Nonclustered index? Below is the query that is calling this table, the indexed for this table and the execution plan.
SELECT "ContactID" ,"ContactName" FROM "dbo"."Contacts"
Here the execution plan:
Here are the indexes that are in the table that this slow query is referencing:

Your query has no WHERE clause, so there's no way to avoid scanning the whole table, since you're asking for every row to be returned.

SQL Execution Plan High Cost 70%

Im running a query that is taking 2 seconds but it should perform better, so I run the Execute Plan details from SQL Managemenet Studio and I found a "step" in the process that the Cost is 70%.
Then, I right click on the item and I found an option that says "Missing Index Details", after I clicked that then a query with a recommendation is generated:
/*
Missing Index Details from SQLQuery15.sql - (local).application_prod (appprod (58))
The Query Processor estimates that implementing the following index could improve the query cost by 68.8518%.
*/
/*
USE [application_prod]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[cloud_document] ([isactivedocument])
INCLUDE ([objectuid])
GO
*/
So my question is exactly what happens if I execute the query? Is it going to affect my database, is there any sideback or sideeffects after applying that?
Thanks a lot and appreciate in advance.

Running the query qill create an Index on the table specified (cloud_document).
This should improve the reading performance and improve performance/query time.
It does also affect the performance of INSERT/UPDATE/DELETE statements as the indexes needs to be maintained during these statements.
The decision to use indexing, how many indexes and what the index consists of is more an art than an exact science.
The actual maintinance of indexes, defragmenting, and statistics is something that can be automated, but should be left, until you have a better understanding of what indexes are and what they do.
I would recomend that you start reading some documentation regarding indexing.
May start with Stairway to SQL Server Indexes

The literal meaning is telling you to build an index on isactivedocument of table [dbo].[cloud_document], from which I assume you are using isactivedocument as a condition to filter the table, and select the column objectuid. Something like
select objectuid, ... from [dbo].[cloud_document] where isactivedocument = 0/1
Note that the "Clustered Index Scan (70%)" doesn't mean it's a problem of the clustered index. It means you don't have index on isactivedocument, then sql engine has to scan clustered index to find what you want. That's why it's taking so much pressure. Once you create index on isactivedocument, check out the plan again, and you'll see the node becomes "index seek", which is a much faster way to find out what you want.
Usually if your database stress mainly comes from query, new index doesn't do much harm to your system. Just go ahead and create the index. But of course you need to keep index quantities as less as possible.

What should I do to get an Clustered Index Seek instead of Clustered Index Scan?

I've got a Stored Procedure in SQL Server 2005 and when I run it and I look at its Execution Plan I notice it's doing a Clustered Index Scan, and this is costing it the 84%. I've read that I've got to modify some things to get a Clustered Index Seek there, but I don't know what to modify.
I'll appreciate any help with this.
Thanks,
Brian

W/o any detail is hard to guess what the problem is, and even whether is a problem at all. The choice of a scan instead of a seek could be driven by many factors:
The query expresses a result set that covers the entire table. Ie. the query is a simple SELECT * FROM <table>. This is a trivial case that would be perfectly covered by a clustred index scan with no need to consider anything else.
The optimizer has no alternatives:
the query expresses a subset of the entire table, but the filtering predicate is on columns that are not part of the clustered key and there are no non-clustred indexes on those columns either. These is no alternate plan other than a full scan.
The query has filtering predicates on columns in the clustred index key, but they are not SARGable. The filtering predicate usually needs to be rewritten to make it SARGable, the proper rewrite depends from case to case. A more subtle problem can appear due to implicit conversion rules, eg. the filtering predicate is WHERE column = #value but column is VARCHAR (Ascii) and #value is NVARCHAR (Unicode).
The query has SARGale filtering predicates on columns in the clustered key, but the leftmost column is not filtered. Ie. clustred index is on columns (foo, bar) but the WHERE clause is on bar alone.
The optimizer chooses a scan.
When the alternative is a non-clustered index then scan (or range seek) but the choice is a to use the clustered index the cause can be usually tracked down to the index tipping point due to lack of non-clustered index coverage for the query projection. Note that this is not your question, since you expect a clustered index seek, not a non-clustred index seek (assumming the question is 100% accurate and documented...)
Cardinality estimates. The query cost estimate is based on the clustered index key(s) statistics which provide an estimate of the cardinality of the result (ie. how many rows will match). On a simple query This cannot happen, as any estimate for a seek or range seek will be lower than the one for a scan, no matter how off the statistics are, but on a complex query, with joins and filters on multiple tables, things are more complex and the plan may include a scan where a seek was expected because the query optimizer may choose plan on which the join evaluation order is reversed to what the observer expects. The reverse order choice may e correct (most times) or may be problematic (usually due to statistics being obsolete or to parameter sniffing).
An ordering guarantee. A scan will produce results in a guaranteed order and elements higher on the execution tree may benefit from this order (eg. a sort or spool may be eliminated, or a merge join can be used instead of hash/nested joins). Overall the query cost is better as a result of choosing an apparently slower access path.
These are some quick pointers why a clustered index scan may be present when a clustered index seek is expected. The question is extremly generic and is impossible to give an answer 'why', other than relying on an 8 ball. Now if I take your question to be properly documented and correctly articulated, then to expect a clustered index seek it means you are searching an unique record based on a clustred key value. In this case the problem has to be with the SARGability of the WHERE clause.

If the Query incldues more than a certain percentage of the rows in the table, the optimizer will elect to do a scan instead of a seek, because it predicts that it will require fewer disk IOs in that case (For a Seek, It needs one Disk IO per level in the index for each row it returns), whereas for a scan there is only one disk IO per row in the entire table.
So if there are, say 5 levels in the b-tree Index, then if the query will generate more than 20% of the rows in the table, it is cheaper to read the whole table than make 5 IOs for each of the 20% rows...
Can you narrow the output of the query a bit more, to reduce the number of rows returned by this step in the process? That would help it choose the seek over the scan.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas